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[57] ABSTRACT 

An infonnation system (10) including a plurality of com- 
puter systems is provided. The information system (10) 
includes a first computer system (IQ having an IBM 
System/360/370 VO interface channel (18). The first com- 
puter system (16) is operable to coromumcatc SNA and 
non-SNA protocol infonnation via the IBM System/360/370 
I/O interface channel (18). The information system (10) 
includes a second computer system (40) having a SCSI bus 
(38), The second coir^uter system (40) is operable to 
communicate SCSI protocol information via the SCSI bus 
(38). An adapter (36) is coujAed to the IBM System/360/370 
I/O interface channel (18) of the first computer system (16) 
and the SCSI bus (38) of the second con^ter system (40). 
The adapter (36) is operable to interface the SCSI bus (38) 
with tke IBM System/360/370 I/O interface channel (18) to 
aUow bi-directional conummication between the first com- 
puter system (16) and the second computer system (40). 
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ADAPTER FOR INTERFACING A SCSI BUS 

WITH AN IBM SYSTEM/36«y370 I/O 
INTERFACE CHANNEL AND INFORMATION 
SYSTEM INCLUDING SAME 

TECHNICAL FffiLD OF THE INVENTEON 

This invention rdatcs in gcma^ to the field of electronic 
systems, and more particularly to an adapter for interfacing 
a SCSI bus with an IBM System/360/370 I/O interface 
diannel and an infonnatioa system including such an 
adapter. 

BACKGROUND OF THE INVENTION 

Numerous public and private organizations use informa- 
tion systems that include two types of con^uter systems. 
The first type of computer system generally con^jrises 
IBMd^ mainframe and compatible systems supp<sting SYS- 
TEM NETWORK ARCHTTECrURETM (**SNA-) some- 
times referred to as legacy systems, These legacy systems 
commonly support communication of information via an 
IBM System/360370 I/O interface channel where the infor- 
mation is formatted according to various protocols including 
SNA. An IBM Systcm/360/370 IfO interface channel can 
con^mse a block-multiplexed or "bus & tag" channel pro- 
viding approximately 4.5 megabytes per second of data 
transmission bandwidth or an ESCON fiber-optic channel 
providing approximately 17.5 megabytes per second of data 
transmission bandwidth. 

The second type of computer system generally comprises 
computer workstations and personal computers sudi as 
UNIX®-based systems, DOS and WINDOWS™ systems, 
OS/2® systems and MACINTOSH® systems, These sys- 
tems commonly support communication of information via 
a conmiunication network where the information is format- 
ted according to a TCP/IP protocoL Local area networks 
("LAN's") and wide area networks C*WAN's") arc often 
created by interconnecting these systems to form workgroup 
environments generally providing bandwidths in the range 
of 10-16 megabits per second. Computer workstations and 
personal conoputcrs also commonly support communication 
via a Small Con^uter Standard Interface ("SCSI') bus to 
provide standard connectivity for peripheral devices such as 
internal/external hard drives or tape drives. 

It is important for an organizatioD having an information 
system that includes both of these types of con^Hiter systems 
to be able to access and integrate information housed in 
legacy systems with information distributed throughout 
numerous workgroup environments of computer worksta- 
tions and personal computes. Bidirectional movement of 
information and greater bandwidth are in^rtant consider- 
ations in providing this interconncctiyity. It is desirable for 
information to travel in both directions between a legacy 
system and a computer workstation or personal coiiq}uter 
and to do so at as large a bandwidth as possible. Currently, 
the bandwidth for such communication of information is 
limited by the bandwidth of the communication nctw^k to 
which the computer workstation or personal computer is 
connected. 

SUMMARY OF THE INVENTION 

In accordance with the present invention^ disadvantages 
and problems associated wi& prior systems and methods for 
interfacing a computer system supporting communication 
via an IBM System/360/370 I/O interface channel with 
cony)uter systems in multi-vendor local area networks have 
been substantially reduced or eliminated. 
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According to one embodiment of the present invention, an 
information system including a plurality of computer sys- 
tems is provided. The information system includes a &5t 
computer system having an IBM System^6(V370 JIO inter- 

5 face channel. The first conqmter system is operable to 
commnnicate SNA or channel protocol information via the 
ffiM System/360/370 I/O interface dianneL The informa- 
tion system includes a second compter system having a 
SCSI bus. The second computer system is operable to 

10 communicate SCSI protocol infonnation via the SCSI bus. 
An adapter is coupled to the IBM System/360/370 I/O 
interface channel of the first computer system and the SCSI 
bus of the second computer system. The adapter is operable 
to interface the SCSI bus with the IBM Sy stem/360/370 VO 

15 interface channel to allow bidirectional communication 
between the first computer system and the second computer 
system- 
According to another embodiment of the present 
invention, an adapter for interfacing a SCSI bus with an IBM 

20 System/360/370 I/O interface channel is provided. The 
ad^tCT includes a channel interface unit operable to couple 
to an IBM System/360/370 I/O interface channel. The 
channel interface unit is further cpcrable to communicate 
SNA and IBM channel protocol information via the IBM 

25 System/360/370 VO interface channel The adapter also 
includes a SCSI interface unit operable to couple to a SCSI 
bus. The SCSI interface unit is further operable to commu- 
nicate SCSI protocol information via the SCSI bus. A 
processor is coupled to the channel interface unit and to the 

30 SCSI interface unit The processor is operable to control the 
channel interface unit and the SCSI interface unit to allow 
bidirectional communication t)etween the SCSI bus and the 
IBM System/360/370 I/O interface channel. 

Technical advantages of the present invention include 
providing an ad^ter for interfacing a SCSI bus with an IBM 
System/360/370 I/O interface diannel to allow bidirectional 
communication taking advantage of the bandwidth of tibe 
IBM System/360/370 I/O interface channel Because of the 
wide availability of SCSI device ports on computer work- 

^ stations and personal computers, an adapter constructed 
according to ^e teachings of the present invention benefits 
numerous information systems currently used by organiza- 
tions that include both SNA and TCP/H* enviromnents. 

45 BRIEF DESaUFnON OF THE DRAWINGS 

A more complete understanding of the present invention 
and advantages thereof may be acquired by reference to the 
following description taken in conjunction with the accom- 
panying drawings in which like r^aencc numbers indicate 
like features and wherein: 

FIG. 1 illustrates one embodiment of an information 
system including an ad^tcr for intofacing a SCSI bus with 
an IBM System/360/370 I/O interface diannel constructed 
according to the teachings of the present invention; 

FIG. 2 is a block diagram of one embodiment of an IBM® 
mainfi^me connected to a SCSI host via an adapter fen: 
interfacing a SCSI bus with an IBM System/360/370 I/O 
interface channel constructed according to the teachings of 
^ the present invention; 

FIG. 3 is a block diagram of one embodiment of the 
adapter for interfacing a SCSI bus with an IBM System/ 
360/370 VO interface channel of FIG. 2; 

FIGS. 4A and 4B are block diagrams of one embodiment 
65 of software and firmware operating in the SCSI host and the 
ad^ter for interfacing a SCSI bus with an IBM System/ 
360/370 VO interface djannel of FIGS. 2 and 3; 
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HGS. 5A, 5B-1, 5B-2, 5B-3, SB-4 and 5C-1, 5C-2, 5D, 48 and tenmnal seivcr 50 supplying a wide area network 

5E-1, 5E-2. 5F-1, 5F-2. 5G, 5H-1, 5H-2, SH-S, 51, 5J-1, (*WAN"), each of whidi has a communication unit coupled 

5J-2, 5J-3 and 5K are schematics showing one embodiment to TCP/IP network 42. In the embodiment of FIG. 1, 

of the components and interconnections for the adapter for computer systems 44 and computer systems 48 may 

interfacing a SCSI bus with an IBM Systcm/360/370 I/O 3 conqxrise, for exan^)Ie,multi-vendOTcon:^uter workstations 

interface channel of FIGS. 2 through 4; and pa-sonal computers. 

FIGS. 6A-1, 6A-2, 6B, 6C-1, 6C-2, 6D-1, 6D-2, 6E-1, ^ operation, mainframe 16 can communicate with com- 

6E-2, 6F-1. 6F-2, 6G-1, 6G-2, ffl, 61-1, 63-% 6J-2, systems 44, printer 46, computer systems 48 and 

6K-1, 6K-2, 6L-1, 6L-2, 6M-1, 6M-2, 6M-3, 6M-4, 6N-1, terminal Sffver 50 through first gateway device 28, com- 

6N-2, and 60-1, 60-2 are schematics showing one embodi- system 30, second gateway device 32, and third 

ment of die logic components and interconnections for a gateway device 34. However, the bandwidth of this com- 

channel controller of the adapter for Interfacing a SCSI bus munication is limited by the bandwidth of SDLC network 

with an IBM System/360/370 I/O interface channel of FIGS. ^-^^ network 24 Token Ring network 26 and TCP/IP 

2 through 4. network 42. 

15 Mainframe 16 also can communicate with computer sys- 

DETAILED DESCRIFnON OF THE tem 40 dirough adapter 36. Computer system 40 can com- 

INVENnON municate with con^utei systems 44, printer 46, conq)uter 

Information System having SNA and TCP/IP environments systems 48 and terminal server 50 through TCP/IP network 

HG. 1 iUustiates one embodiment of an information f^* Adapter 36 is constructed accOTding to the teachings of 

system, indicated generaUy at 10, Information system ^ « ^.'rm?;^^^!^ 

iclud^ a SYCTEM NETWORK ARCHTTECrTOE™ 38 with IBM ^st^60/370 I/O interface channel 18 

CW) enviromncnt, indicated generally at 12. and a u!!;?^„^JJi^.2^^^^^ communication 

TCP/IP environment, indicated generaUy at 14. SNA envi- ^en cor^utcr system 40 and mamframe 16. 

romnentUindudesamainftamlie. Jthe^« Adapter interfacing SCSI bus with B 

no. 1, mainframe 16 con^iises an IBM® mainframe using ^ ^^r^! u a: p . to>*^ 

AaniMd. Mainframe 16 has a number of IBM System/360/ "^"^ 1« constructed accordmg to the teachmgs 

370 I/O interface chamids 18 accesdbfe ttaough IBM ^^r'^^^Ti^nn -> ^n^u ^ 

Systen,fl6Q«70 I/O interface chamid ports in iSfiame .r^t^^^'^'f^V: ^^''^^ ^ 

2^ UNIX<D-based conq)uter workstation. Mamframe 16 has an 

* . , ^ ^„ . , , , , IBM System^6(y370 170 interface channel port 51 through 

A front-end processor 20 is coupled to a chamiel 18, as System^6(y370 VO interface chamiel 18 is 

shown. Front-endpocessor 20 is^c^^^^ 33 accessible. Adapter 36 has an IBM System^60y370 I/O 

teimma^ networks mdudi^ an SDLC nrtwork 22, an X25 interface chamiel connector 52 coupled to IBM System^60/ 

network 24 and a Token Ring network 26. Front-end pro- 370 i^^terface channel port 51. Ad^ter 36 also has a 

"^ui^"^ A other tj^s tenmnal netwOT^a^ SCSIbus connector 53. SCSIhost 40has aSCSI deviceport 

"^f^ 1?"^'^ f f^'^^^^^ 5^ ^"eb which SCSI bus 38 is accessible and whicS is 

network 22. A compiitcr system 30 operatmg as a gateway ^ coupled to SCSI bus connector 53. SCSI host 40 has a 

device and a se^nd hardware gateway de^ce 32 are commmiicationportSS coupled to TCP/IP network 42. SCSI 

couplwJ to ToteD Rmg network 26. A thn:d tedware gate. host 40 is operable to run software providing an SNA-TCP/ 

way device 34 is coupled to a second cha^d 18 of ip protocol transktor 57 and a fast me transfer unit 58. 

mainframe 16, as^own. In the embodiment of nG. l,fo^ Mainframe 16 is operable to communicate SNAandlBM 

gateway device 28, second g^^Jcway device 32 and ted 45 SystemAB 60/370 I/O interface chamid protocd informalion 

gateway device 34 compase OC I stend^^^ via IBM System/360/370 VO interface diannd 18, and SCSI 

^^c^i ?f . ^r^f^^^^"*"? OPENCONNECT® host40is operable to commmiicate SCSI protocol infonna- 

SYSimiS located in Dallas Tex Lifee embodmient of ^ion via SCT bus 38. According to the^diings of the 

HG. 1, workstetLon30 censes a UNIX®-based workrta. ^^^^ ^^^^^^^ 35 ^^^^le to interface SCSI 

^S^^^i^^^JL^^T® ^"^"^^^ 50 bus 38 with IBM System/36Q/370 I/O interface duumd 18. 

fromOPENCONNECr®SYCTEMSlocatedinDaUas,Tex. Adapter 36 commLcates information via IBM System/ 

An adapter 36 is coupled to a third channd 18 of 360y370 I/O interface channd 18 and SCSI bus 38 such that 

mainframe 16. Adapter 36 is also coupled to a SmaU ad^ter 36 appears as a peripheral device to SCSI host 40 

Computer Standard Interface ("SCSF) bus 38 of computer and as a system supporting SNA and IBM System/360/370 

system 40. SCSI bus 38 is accessible through a SCSI device 5s I/O interface channd protocols to mainframe 16. In this 

portofcomputersystem40.Adapter36opcratestointerface manner, adapter 36 enables bidirectional high bandwiddi 

SCSI bus 38 with diannd 18 according to the teachings of communication between mniTifrf^Tne 16 and SCSI host 40. 

the present invention. A user of SCSI host 40 can access infOTmation housed in 

TCP/IP environment 14 inchides a TCP/IP network 42 mainframe 16 and can p-ovide information to mainfrflTn<>^ 16 

TCP/IP network 42 is coupled to communication units of 60 using the full bandwidth capability of IBM System^60y370 

first gateway device 28, computer system 30, second gate- I/O interface channel 18. SNA-TCP/IPprotocol translator 57 

way device 32, third gateway device 34 and computer allows SCSI host 40 to interpret and transmit information 

system 40. TCP/IP environment 14 indudes a first work- according to TCP/IP protocol or SNA protocol. In the 

group environment having computer systems 44 and printer embodunent of HG. 2, TCP/IP protocol translator 57 oom- 

46, each of which has a communication unit coupled to 65 prises one version of OC SERVER n software gateway 

TCP/IP network 42. TCP/IP environment 14 indudes a available from OPENCONNECT® SYSTEMS located in 

second workgroup environment having cwnputcr systems Dallas, Ibx. Fast file transfer unit 58 allows SCSI host 40 to 
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move bulk files at optimum speed using IBM System/360/ arMtrary.limits to the number of concurrent sessions arc not 
370 I/O interface channel protocols. In the embodiment of imposed. Available software products including OC 
FIG. 2, fast file transfer unit 58 comprises one version of SERVER H and OC/GTO can operate in conjunction with 
OC/GTO software available from OPENCONNECT® SYS- ad^ter 36 with insignificant loss of function. Ad^ter 36 is 
TEMS located in Dallas, Tex. 3 also flexible to allow adaptation to implement additional 

In the enobodiment of FIG. 3, adapter 36 con[q)rises a channel protocols over those cuircndy implemented, 
combination hardware and software device contained in a Block diagram of adapter cox]:q>onents 
separate housing and operating to interface SCSI bus 38 of FIG. 3 is a block diagram of one embodiment of adjptcr 
SCSI host 40 to mainframe 16 via IBM System/36Q^70 VO 36 for interfacing SCSI bus 38 with IBM System/360/370 
interface channel 18. Adapter 36 provides IBM System/360/ lO I/O interface channel 18 of FIG. 2. Adapter 36 includes a 
370 VO interface chamiel attachment for protocol translator processor. 60 that provides 32-bit CPU, direct memory 
57 and file transfer unit 58 using a general solution rather access ("DMA"), timers, DRAM controller, interrupt 
than platform-specific hardware and software. Adapter 36 controller, parallel ports and serial ports. In the embodiment 
con^3rises a microoon^uter system having electronics and of FIG. 3, processor 60 comprises an MC68360 micropro- 
logic components for directly connecting and interfacing 15 cesser. Processor 60 is coupled to a SCSI controller 62 and 
SCSI bus 38 with IBM System/360/370 VO interface chan- to a channel controller 64 via a bus 61 and respective DMA 
nel 18. Adapter 36 can be housed in a small enclosure with connections 63. SCSI controller 62 provides an 8-bit CPU 
connectors for attaching to SCSI bus 38, IBM System/360/ bus and a 16-bit DMA bus, and channel controller 64 
370 VO interface channel 18, and to an external power provides a 32-bit bus. In the embodiment of FIG. 3, SCSI 
supply. In another embodiment of the present invention, 20 controlla 62 con^ses an NCR 53CF96-2 SCSI controller, 
adapter 36 can provide IBM System/360/370 VO interface and channel controller 64 comprises an XC4006-6 program- 
diannel attachment for a counter system 40 utilizing IBM mable logic compoacnt obtained from XDUNX®. 
System/36Q/370 I/O interface channel ESCON attachment SCSI controller 62 is coupled to a SCSI connector 65 

Adapter 36 includes software that works with hardware whidi is operable to connect to SCSI bus 38. Channel 
components to in^jlement low level SNA and IBM System/ 25 controllar 64 is coupled to a bus interface 66 and to a tag 
360^701/0 interface diannelprotocols, one or more logical interface 68. Bus interface 66 and tag interface 68 are 
diannel devices, such as 3274, SCSI devices to interface coupled to channel connector 69 whidi is operable to 
with SCSI device drivers on SCSI host 40, administrative connect to IBM System/360/370 I/O interface channel 18. In 
functions and glue functions. Ad^tcr 36 also includes the embodiment of FIG. 3, IBM System/360/370 FO inter- 
firmware that provides a bootstrap loader for downloading 30 face diannel 18 comprises a block-multiplexed or "bus & 
operating software from SCSI host 40 across SCSI bus 38. tag" channel providing approximately 4.5 megabytes per 
Adapter 36 firmware also provides a driver for a front panel second of bandwidfii. Other embodiments of FIG. 3, System 
user interface and a configuration program to allow user 360/370 VO interface channel comprise •'ESCON*' channel 
entry of a SCSI ID, which must be set before SCSI host 40 providing approximately 17.5 megabytes per second of 
can contact adapter 36. Additionally, adapter 36 firmware 35 bandwidth. A dynamic RAM 70, a static RAM 72, a pro- 
provides powcr-on self test, diagnostics, and a development/ granunable ROM 74, and a front panel 76 are coupled to 
debugging functions. processor bus 61. 

SCSI host 40 uses system device drivers and hardware at Hie operation of adapter 36 is controlled by processor 60, 
lowest layers to attach to adapter 36. In the embodiment of SCSI controller 62, and channel controller 64. SCSI con^ 
FIG. 2, SCSI host 40 conges a UNDC® platform, and 40 troUer 62 interfaces with SCSI bus 38 and diannel controller 
protocol translator 57 comprises OC SERVER IL In this 64 interfaces with IBM System/360/370 VO interface chao- 
configuration, datalink software is used to interface to nel 18. Processor 60 links SCSI controller 62 with channel 
adapter 36. This software includes an OC SERVER E controller 64 to enable direct communication between SCSI 
resident task to implement an application program interface bus 38 and IBM System/360/370 I/O interface channel 18. 
("APT*) for adapter 36 and to multiplex program units 45 In the embodiment of FIG. 3, the components of adapter 36 
("PU^s"). Where SCSI host 40 comprises a large computer can be mounted on and housed in a main circuit board, a 
system, an external UNIX® process may also be used to connector board, a front panel assembly, an enclosure base 
multiplex OC SERVER H processes. In the embodiment of witba rear panel, and an enclosure cover. An external power 
FIG. 2, fast file transfer unit 58 comprises OC/GTO soft- supply, SCSI bus cables, bus/tag converter cables, and 
ware. In this environment diangcs to a MAMTCP unit in 50 terminators can be provided separately, 
the OC/GTO software are made in order to implement the In the embodiment of FIG. 3, processor 60 of ad^ter 36 
API for adiQ)tcr 36. comprises a MOTOROLA® MC68360 and provides a cen- 

As a UNIX® platform, SCSI host 40 includes a daemon tral processing unit ("CPIT) and on-board periiAcrals. An 
process that provides error logging capability for ad^er 36 MC68360 implements a 25 MHz CPU32 as its core 
cither to a file or to an SNMP monitor. This process also 55 processor, which is a 32-bit CPU from the 68000 family. As 
provides a path to adapter 36 for administrative utilities such, processor 60 is fully implemented as a 32-bit machine, 
residing on SCSI host 40 to allow loading and starting of and on-board per^herals include DRAM controller, timers, 
operating software located in adapter 36, and to provide two indq>endent DMA controllers, an interrupt controller, 
configuration, dump and trace administration. serial peats, baud rate generators, and parallel ports. A serial 

Adapter 36 operates to provide numerous functional 60 port resident in processor 60 can be utilized to support a 
advantages. Normal DC-intalocked and high-speed transfer firmware-based debugger program. TTie serial port can be 
features of IBM System/360/370 I/O interface channel 18 terminated on a main circuit board of adapter 36 with no 
can be supported. Data streaming features of IBM System/ provision for a debug port external to the enclosure. An 
360/370 VO interface channel 18 can also be supported as internally accessible ABORT switch can also be provided, 
well as burst data rates equal to the 4,5 megabytes per 65 In the embodiment of FIG. 3, SCSI controller 62 of 
second bandwldtii of data streaming on IBM System/360/ adapter 36 comprises an NCR 53CF96-2. As such, SCSI 
370 VO interface channel 18. Adapter 36 operates such that controUa 62 provides an interface to SCSI bus 38 including 
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line drivers^rcccivers, and handles ail SCSI bus protocol In 
the illustrated embodiment, only single-ended drivers and 
receivers arc in^emcnted. 

In the embodiment of FIG. 3, channel controller 64 
implements drivers and receivers to interface to IBM 
System/360/370 I/O interface channel 18, Channel control- 
ler 64 comprises a programmable logic component available 
from XDJNX® and is programmed to implement a handler 
for low level IBM System/360^70 1/0 interface channel 
protocol A32-bytc FIFO in channel controller 64 along with 
a dedicated DMA channel and intemipt provide adapter 36 
real time response cq>abilitics to service IBM Syst^n/36Q/ 
370 I/O interface channel 18. 

In the embodiment of FIG. 3, adapter 36 comprises a 
number of memory components. Dynamic RAM 70 com- 
prises an 8 megabyte, ^pandable to 32 megabyte, DRAM 
(2 Mx32-bits). Dynamic RAM 70 is operable to provide 
general user RAM, buffers, and other such functions. Static 
RAM 72 comprises a 512 kilobyte SRAM (128 Kx32-bits) 
and is operable to provide program storage, stack space and 
storage for global variables. Programmable ROM 74 com- 
prises a 128 kilobyte EEPROM (64 Kx8-bits) and is oper- 
able to provide firmware and non-volatile configuration 
storage. In addition, there is 1536 bytes of system RAM 
resident in processor 60. 

Front panel 76 provides an operator interface for adapter 
36 and can oon^irise an operator panel including L£D*s, a 
multiple character LCD display, and membrane switches or 
buttons. The LED's can be allocated as indicators for power, 
halt, online, and operational status, and for CPU, IBM 
System/360/370 I/O interface channel and SCSI bus activity. 
Ihe membrane buttons can be allocated fox RESET, MENU 
and SELECT functions. In addition to displaying an opera- 
tional status of adapter 36, the LCD display can be used in 
conjunction with MENU and SELECT buttons to implement 
a menuing system for entering configuration data such as 
SCSI identification code C*SCSI ID"). 

In the embodiment of FIG. 3, front panel 76 can course 
a modular LED and membrane switch component with a 
flexible flat cable connector, a two line by sixteen character 
LCD display fen- status and configuration information, a 
ribbon cable, and brackets for mounting to an enclosure 
base. The LED/switch component provides buttons for 
RESET, MENU and SELECT, and provides LED indicators 
for POWER, PROCESSOR HALT and BUSY, CHANNEL 
and SCSI I/O ACnVTTY, and ONLINE ENABLE and 
OPERAnONAL OUT CHANNEL STATUS. The LCD dis- 
play can operate as an 8-bit peripheral off main CPU bus 61. 
The LCD display can include two lines having an 
80-character buffer for storing ASCH data, of which 16 
characters are visible at a time. This display can include 
cursor and highlighting capabilities as well as high level 
commands for shifting the display. Front panel 76 also 
includes a reset button hardwired to a board reset circuit of 
adapter 36. MENU and SELECT buttons can be handled by 
a parallel port resident in processor 60 with intenupt gen- 
crating capability. Front panel 76 further includes LED*s 
that are indicators of hardware signals with the exception of 
ONUNE and PROCESSOR BUSY indicators. ONUNE 
can be a software-controlled indicator intended to directly 
reflect a state of the ONUNE setting, entered by an operator 
via a menuing system for fi-ont panel 76. PROCESSOR 
BUSY can be automatically set by hardware when processor 
60 of adapta 36 awakens firom a STOP condition, and can 
be reset by software inamediately before issuing a STOP 
instruction. 

AH logic components of adapter 36 can reside on a main 
circuit board mounted to an enclosure base. A DC power 
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switch and a five-pin circular DIN-connectOT for DC power 
can be right-angle components mounted directly to the main 
circuit board. Post-row connectors for ribbon cables can 
connect fix)nt panel 76. Post-row connectors can also be 

5 provided for attaching a debug console and abort switch. 
The main circuit board can include a 96-pin DIN connector 
for passing all SCSI and IBM System/360/370 1/0 interface 
channel signals to a connector circuit board. 
A connector circuit board can provide mini-D 50-pin 

10 conneOors 65 for SCSI-In and SCSI-Out for SCSI bus 38, 
and 100-pin hi-dcnsity HIETI style connectors 69 for Bus/ 
Tag-In and Bus^g-Out for IBM System/360/370 I/O inter- 
face channel 18. The connector circuit board can mount to 
a rear panel with connectors protruding through. The oon- 

is nector circuit board can also provide a 96-pin DIN connector 
for direct attachment to a main drcuit board, which supplies 
electrical signals for SCSI bus 38 and IBM System/360/370 
1/0 interface channel 18. Such a setup is similar to a VME 
backplane arrangement utilizing conventional VME parts. 

20 A base of an enclosure for adapter 36 can be a sheet metal 
component that includes a rear panel for ad^ter 36. The rear 
panel can have openings for SCSI-In and SCSI-Out con- 
nectors 65, and for Bus/Tag-In and Bus/Tag-Out connectors 
69. The rear panel can also have openings for a DC power 

25 connector and switch. A circuit board can mount to the rear 
panel providing connectors 65 for SCSI bus 38 and connec- 
tors 69 for IBM Systera/3 60/370 I/O interface channel 18. 
An enclosure cover can be a sheet metal component pro- 
viding a top and sides for ad^ter 36, and a bezel for front 

30 panel 76. 

Adapter 36 can use Bus/Tag-In and Bus/Tag-Out assem- 
blies developed for the Openconnect Systems 3030 hard- 
ware gateway 34, to convert frcHn 100-pin high-density HIFI 
connectors to standard Bus and Tag serpentine connectors. 

35 Cables for attachment to SCSI host 40 can be provided for 
common SCSI connectors such as mini-D 50 to mini-D SO, 
mini-D SO to Centronics, and mini-D 50 to DB-50. An 
external power supply can be utilized to provide DC power 
to adapter 36. Connection to adapter 36 can be via a 5-pin 

40 circular DIN connector accessible from a rear panel adapter 
36. The power siqjply can comprise a wide range model that 
needs no special configuration for domestic or international 
installations, and a country-spedfic pow^ cord can be 
provided separately. 

45 Block diagram of software and firmware 

FIGS. 4A and 4B are block diagrams of one embodiment 
of software and firmware operating in SCSI host 40 and 
adapter 36 of FIGS. 2 and 3. Adapter 36 con^ses SCSI 
transport service 100 which includes a SCSI interface, an 

50 application program interface ("AFT') provider, eight logi- 
cal units (*l.UN's**), and tape device emulation, as shown. 
Adapter 36 also comprises ad^ter control task 102 which 
includes dispatch, administration, path manager, 3274 
manager, and file transfer manager components. Adapter 36 

55 further comprises firmware 104 and channel interface 106. 
Firmware 104 includes initialization and control, power-on 
self test, bus interface and firmware services. Channel 
interface 106 includes channel physical and LDH 3274 and 
LDH-CJTO components. The software and firmware com- 

60 ponents of adapter 36 are interconnected as shown in FIGS. 
4A and 4B. 

SCSI host 40 comprises a UNIX® kernel 108 which 
includes a SCSI interface, Xzpc devices, and a file system 
interface. SCSI host 40 also comprises an administration 
65 daemon 110, a first SNA unit 112, a second SNA unit 114, 
and a fast file transfer unit 116. In the illustrated 
embodiment, SNA units 112 and 114 comprise OCSNA 
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I^ocesses, and mc trattsfCT mut U6 comprises a MAMTCP POINTERS, DISCONNECT, INITIATOR DETECTED 
(GTO) process established by OC SERVER n software ERROR, ABORT, MESSAGE REJECT, NO OEERAnON, 
available from OPENCONNECT® SYSTEMS located in MESSAGE PARITY ERROR, BUS DEVICE RESET, 
Dallas, Tex. The software components of SCSI host 40 arc IDENTIFY, and SYNCHRONOUS DATA TRANSFER 
interconnected as shown in FIGS. 4A and 4B. 5 REQUEST. SCSI defines other messages supporting 'linked 

Functional layors — SCSI bus commands" and ^tagged queuing** that are not significant to 

SCSI bus 38 comprises a fimctional layer of adapter 36 the illustrated embodiment of adapter 36. 
and SCSI host 40. The ANSI specification for SCSI, and Each SCSI device can define up to eight Logical Units 
SCSI-n (ANSI X3J3M990), defines signals, cables, (**LUN's'*). As shown in HGS, 4A and 4B, adapter 36 
drivers, connectors, tenninators and timings on a SCSI bus. lo firmware defines all eight LUN's for potential use since 
Within this specification, adapter 36 can iny)lement single SCSI host 40 may not support dynamic definition of LUN's. 
ended drivers and an 8-bit bus. SCSI host 40 and other To access a LUN in adapter 36, SCSI host 40 arbitrates for 
daisy-chained peripherals should do the same. Single ended SCSI bus 38, selects adapter 36, then issues an IDENTIFY 
drivers limit tiie cable length between SCSI devices to six message that specifies a LUN wifliin adapter 36. Some 
meters. In adapter 36, a SCSI bus interface is implemented is SCSI-1 hosts may skip this message and pass the LUN in the 
by SCSI control!^ 62, and software is involved in setting CDB. A device command to initiate a dataread or a write can 
configuration parameters such as the SCSI ID. The same is then be sent to the LUN in adapter 36. Unless die LUN is 
true for SCSI host 40. ready to proceed with the data transfer^ it should respond 

Functional layors^CSI logical with a DISCONNECT message to effectively suspend the 

A SCSI logical layer exists in both SCSI host 40 and 20 current command at SCSI host 40 and get off SCSI bus 38. 
adapter 36. The SCSI logical layer can operate in a SCSI-1 If the LUN diose to disconnect, then the LUN arbitrates 
mode of operation until SCSI-n has been negotiated from for SCSI bus 38 and paform RESELECTION when it is 
SCSI host 40. A SYNCHRONOUS DATA TRANSFER ready to proceed with the data transfer. The LUN also sends 
REQUEST message is used to negotiate up to 10 MHz an IDENTIFY message to SCSI host 40 to identify the 
synchronous transfers to support data streaming on IBM 25 reconnecting LUN, so that the appropriate pointers wiU be 
^stem/360/370 1^0 interface channel 18. The SCSI logical loaded. This achieves independent operation of the LUN*s in 
layer can be implemented in adapter 36 by a software adapter 36 and prevents adapter 36 from loddng out other 
component, running mainly at interrupt level, tiiat works devices on SCSI bus 38, It is also possible that application 
with SCSI and DMA controllers in adapter 36. Some fiinc- buffering schemes may require that data transfers be intcr- 
tionality can be configurable to accommodate different types 30 rupted midstream. For these cases, the LUN disconnects as 
of SCSI host computer systems. SCSI logical layer functions described above, but first issues a SAVE DATA POINTER 
on SCSI host 40 can be inq)lemented either in hardware or message to the initiator. The RESTORE POINTERS mes- 
system device drivers. Some of this functional bdiavior can sage is used at RESELECTION time to instruct the initiator 
be configurable. to resume the transfer where it left off. 

The ANSI specification for SCSI defines the logical 35 Functional layers— SCSI tape device 
protocols fa- operating SCSI bus 38, much of which is SCSI t^ devices conaprise another layer in SCSI host 40 
dependent on configuration. SCSI devices are classified as and adapter 36. Witiun the ANSI specification for SCSI, 
Initiators and Targets. With respect to adapter 36, SCSI host chapters are dedicated to discussion of SCSI commands and 
40 is the initiator and adapter 36 is tiie target. Additional status, generic SCSI devices and sequential access devices 
taig^ as well as initiatcwrs may also be present on SCSI bus 40 (in chaptars 6, 7 and 9, icspectivcly). ftovision is also made 
38. for vendor-specific operation. All SCSI tape devices should 

At the lower logical level, SCSI implements a finite state operate within this broad definition, 
machine consisting of *1>us phases" which are in^lemented Device drivers on SCSI host 40 implement an initiator 
by various combinations and sequences of control signals. role for supported tape devices. Depending on system 
BUS FREE, ARBITRAnON, SELECnON and RESE- 45 implementation, an unrecognized t^ device is typically 
LECTION phases are associated with establishing or supported by a general SCSI tape driver. This feature is 
re-establishing a connection between an initiator and a target exploited to provide applications access from SCSI host 40 
to perfonn an infOTmation transfer. COMMAND, DATA (IN to adapter 36 using the standard file system and system- 
OT OUT), STATUS, and MESSAGE (IN or OUT) are the provided device drivers in UNIX® environments, 
infonnation transfer phases. so SCSI host 40 can issue an INQUIRY command to obtain 

Device level commands arc sent ^om the initiator to the device information from adapta 36. Adapter 36 responds for 
target during COMMAND phase to perfam a device opera- all LUN's that adapter 36 is a SCSI-1 sequential access 
tion or to communicate the desire to enter the DATA phase device supporting synchronous data transfer (as mentioned 
to transfer data. The SCSI specification defines a Command above, SCSI-II operation can be negotiated from SCSI host 
Descriptor Block ("CDB") and device opcodes for conamon 55 40 later using the CHANGE DEFINTIION command). In 
devices and their operations. When a command completes, UNIX® systems, this causes ad^ter 36 to be associated 
the target enters the STATUS phase to report command with the general SCSI tape driver, 
results to the initiator, then enters MESSAGE phase to send Where SCSI host 40 does not support such a generic SCSI 
a command complete message. SCSI defines the following tape device, adapter 36 should include vendor and product 
status codes: G OOD, CHECK CONDmON, CONDHION 60 identification of a supported tape device in response to ttie 
MET, BUSY, INTERMEDIATE, INTERMEDIATE CON- INQUIRY This may include specification of the platform of 
DmON MET, RESERVATION CONFLICT, COMMAND SCSI host 40 through front panel 76 of adapter 36 before 
TERMINATED, and QUEUE FULL. download occurs. 

The MESSAGE phase allows the exchange of control Asynchronous overlapped input/ou^ut (*T/0**) is not a 
messages between the target and initiator to manage path 65 universally available feature of tape devices. Therefore, two 
and data flow. Included are the following: COMMAND SCSI tape devices can be included to implement a fuU- 
COMFLETE, SAVE DATA POINTER, RESTORE duplex path to adapter 36. Application data can be trans- 
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feired to and &om adapter 36 using tape WRITE and tape 
READ commands. Other command handling may be 
nonproductive, incident to the emulation of a SCSI t^e 
device. 

Functional layers — application program interface 

An application program interface (**APr*) exists in SCSI 
host 40 for ad^^ter 36. Logical communication paths are 
created between SCSI host 40 and adapter 36 by defining 
SCSI tape devices on SCSI host 40 system as shown. Each 
tape device on SCSI host 40 maps to tape emulation code 
residing in a LUN on adapter 36, and represents a half- 
duplex cnmmnnications path. T\vo tape devices are used for 
full*duplex communications. 

An application resident on SCSI host 40 can communicate 
with an application resident on adapter 36 (such as channel 
devices and administrative components) utilizing one or 
more logical communication paths and a simple message 
protocol. Administrative components can be accessed 
throu^ fixed LUN addresses, 0 and 1. Channel devices can 
be dynamically configured to reside at any of the remaining 
LUN address pairs. 

Standard file system calls can be used by an application on 
SCSI host 40 to access a coiresponding device resident on 
adapto* 36: open, dose, read, write, and select if available. 
A full-duplex logical path can be opened between SCSI host 
40 and ad^ter 36 by issuing "open"* calls to an appropriate 
pair of tape devices. For a path other than the fixed path to 
administrative components, the "open** caUs can be followed 
by path control commands to establish the identity of the 
newly created path. Read and write can be used for passing 
path control commands, as well as application data, encap- 
sulated in message frames for adapter 36. 

A message frame for adapter 36 can be used to maintain 
message boundaries for application data in a consistent way 
for all applications. Among other reasons, this allows 
adapter 36 tape emulation code to present a message- 
oriented interface to devices on IBM System/360/370 I/O 
interface channel 18. The message frame can be composed 
of a Message Header, containing the m^essage length, fol- 
lowed by zero to 64 Kilobytes of application data. 
Application-delined fields for sub-path identificadon and 
coimnand can also be included as part of the Message 
Header. These fields can help provide a consistent mecha- 
nism for applications that require multiplexing capability. 

Functional layers-rLDH 3274 

An LDH 3274 channel device interface comprises anotfaor 
layer in adapter 36. Attachment to IBM System/360/370 I/O 
interface channel 18 can be in^^lemented using a chaimel 
device that emulates the IBM® 3274-41 A Control Unit In 
FIGS. 4A and 4B, this device is referred to as LDH3274 
(Logical Device Handler). LDH3274 manages the coxmecr 
tion and the flow of SNA protocol information between 
mainframe 16 and a physical unit (*TU**} configured in 
adapter 36. There is one LDH3274 instance generated for 
each PU configured in OCSNA 112 and 114. In adapter 36, 
SCSI bus 38 separates the LDH(s) firom the rest of the IBM 
System/360/370 1/0 interface channel gateway provided by 
OCSNA 112 and 114. The PU can communicate with the 
LDH(s) using the message protocol for ad^tcr 36 according 
to the API for adapter 36. 

Since LUN*s are limited, a multiplexing scheme can be 
used for path resolution. A pair of LUN*s, corresponding to 
two tape devices, can be allocated as a full-duplex commu- 
nicadons path for multiple PU*s/LDH*s, possibly from 
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Header for ad£^er 36 as described above. Multiplexing can 
be achieved by utilizing a sub-path identifier field in the 
Message Header for adapter 36. This identifier can be used 
by a multiplexer process on either side of SCSI bus 38 to 
route SNA protocol data and control messages to an ^jpro- 
priate destination. Multiplexer processes comprise interme- 
diate software components between the PU*s and the 
LDH's. 

One sub-path identifier can be reserved for passing con- 
trol messages between multiplexer processes for each pair of 
LUN*s configured for an OCSNA process or an LDH3274. 
The others can be available to be assigned to identify 
configured LDH*s. A separate sub-path for control can aUow 
for the implementation of an efficient flow control 
mechanism, since flow information for all PU*s/LDH's on a 
LUN pair can be passed in a single message. 

A flat BTU buflFer. with extra space for header 
information, can be used to pass SNA protocol data through 
various SNA layers in adapter 36. This header space in the 
BTU buffer can be utilized to receive the Message Header 
for adapter 36, which can be added and stripped away at both 
ends of SCSI bus 38 without consequence. 

The commands listed below are siqiported by LDH3274 
and the "protocol handler**. These commands, when 
appropriate, may be passed over SCSI bus 38 in the com- 
mand field of a Message Header, and then converted to 
'TTEM" conunands by an LDH interface component In 
addition, the flow control mechanism can create a new class 
of commands exchanged between mult^>lex^ng con^nents 
on either side of SCSI bus 38 that are not seen by the LDH. 


PASSBTU 

ik>mal message passing 

REQMS 

request maiatenance statistics (no 


reset) 

REQMSR 

lequest maiatenaoce statistics (with 
reset) 

MSRSP 

maintsimnce statistics response 

HOSTC 

host has sent e oonsect command 

HOSTDC 

host has sent a disc<mnect comaiaix] 

INIZREQ 

initialization request &om the 


protocol handler w/ iniz paims 

SrOPREQ 

temiTnate request frana &e protocol 

DBBRROR 

invalid dib request from ph 

HEIXO . 

are you there? request from the 


protocol handler 

HOWDY 

texas type lespocse to a hello 

INIZRSP 

response to an hjitialization request 

STOPRSP 

response to a stop request 

SNARFED 

Idh is unbelieYal^ messed up 

INT 

a wake up item from the ph 

HOsrrcRSP 

response to HOSIC 

MEMKEQ 

memay allocation request bum 


protocol handler 

MEMRSP 

memory allocation response to 


protocol handler 

CLOSEREQ 

UNIX terminate request from protocol 


handler 

CLOSERSP 

UNIX terminate response to protocol 


handler 

RLSBXJP 

release buffer 

MAKEBUFF 

Affile a channel buffer, fltt^c;b it to 


thj.< item, release to cfaamiel 

HOSIDCLJRLS 

host sent a discannect commands plus 


release this huScr 


Functional layers— LDH-GTO 
An LDH-GTO channel device interface coniprises an 
additional layer in ad^ter 36. OC/GTO is a fast file transfer 
application available from OPENCONNECT® SYSTEMS 
OCSNA processes 112 and 114. Messages flowing over 65 located in Dallas, Tex. The LDH-GTO comprises a non- 

... protocol communications product that relies on a pair 

of diannd devices to give mainframe applications access to 


SCSI bus 38 between LDH3274 and OCSNA process lU or 
OCSNA process 114 in SCSI host 40 carry a Message 
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TCP/IP sockets. A sockets protocol can be passed aaoss 
IBM SystenV360/370 J/0 interface channel 18 between 
GTO components residing on mainframe 16 and in SCSI 
host 40. A GTO channel device protocol can be based on 
Mitek Access Method ("MAM") which is a channel protocol 
designed for high-speed data transfer. Consequently, the 
primary application is for fast file transfer. CTO can be 
supported in ad^ter 36 by two channel devices, referred to 
as Logical Device Handlers C*LDH's"X and a component for 
interfacing an application in SCSI host 40. One LDH 
handles inbound data traffic and the other handles the 
outbound data trafOc. 

Interface protocol to GTO-LDH can be implemented 
using the command set listed below. 


To LDH: 

FFT_SEDAREQ 
FFLJILSSTW . 

FFT_WRnE 

FFr_RLSBUF 

PFT_CLOSEREQ 

FFr_MAXBWSZ_REQ 

Prom LDH: 


FFT_5IAr 

FFT_SDARSP 

FFT_WRIIE 

FFr_J(LSBUF 

FFT_CLOSERSP 
FFT3IAXBUFSZ_JISP 


Set device address 

Give LDH a status buScr for 

rcpordug status 

Write data to inainf TaTTift (iiibouDd 
<kv) 

Give LDH a buffer to read data 
from mainiGraiiic (outbound dev) 
Request close oa EFT LDH. 
Re^ond with FFT_CXOSERSP 
IbH LDH max bu£fer size 
supported 


Response to FFT^^DAREQ 
LDH*5 status for KPT device 
Write fl at^ to momframft has 
completed (inbound dev) 
Read counted, buffer has data 
from nnainfraTTy (outbound dev) 
Response to FFClCLOSEREQ 
Response to FFIL-MAXBUFSZREQ 


25 


30 


These commands flow between the GTO-LDH interface 
conqwnent and the LDH-GTO as ITEM commands over a 
MOS path. Some originate or terminate at the GTO-LDH 
interface component Commands that flow information ova 
SCSI bus 38 arc passed in the command field of the Message 
Header for adapted 36 during tape READ*s and WRTTE's 
from applicatioQ MAMTCP on SCSI host 40, and converted 
to and from ^TTEM" commands by the GTO-LDH interface 
component 

Functional layers — administrative path 

An administrative path for ad^ter 36 conq)rises a further 
layer. A separate administrative path supports logging, 
tracing, and debugging from SCSI host 40. This path can 
also be utilized for other administrative operations^ such as 
an initial download for adapter 36, core dunqu, local trace 
buffer damps, and configuration and control commands to 
adapter 36. A pair of LUN's (corresponding to two t^ 
devices) is allocated as a full-duplex communications path 
for handling alL administrative tr^c to and from adapter 36: 
an Inbound Admin Device (LUN 0) and the Outbound 
Admin Device (LUN 1). 

Separate software and firmware components exist in 
adapter 36 and on SCSI host 40 for executing various 
administrative tasks; such that the tasks can be run 
concurrently, where practical. As an example, logging, 
tracing, and a debug session might be run concurrently, 
whereas a core dump would need to run alone. The sub-path 
identifier field in the Message Header for adapter 36 can be 
used to address each Admin component 

Functional layers— bus interface services 

Several services resident as firmware on ad^ter 36 are 
accessible to SCSI host 40 over one sub-path of the admin- 


istrative path. These services include load, dump, debugger 
and diagnostic sCTvices. The debugger includes commands 
for viewing local traces. These services are collectively 
referred to as Firmware Bus Interface services. 

SCSI host 40 formats a control block (with appropriate 
Message Header), and writes it to an Inbound Admin 
Device. When the conunand completes or when asynchro- 
nous character data from one of the services is available, 
adapter 36 fonnats a control block (with appropriate Mes- 
sage Header) that can be read by SCSI host 40 over an 
Outbound Admin Device. 

In general, the Firmware Bus Interface services are only 
available to SCSI host 40 when adapter 36 is under the 
control of the firmware. Once adapter 36 is loaded and 
started, these services arc unavailable until adapter 36 
receives a STOP conunand and returns to firmware mode. 

A LOAD can be used at power up and reset to download 
operating software into adapter 36. A separate START 
command can be used to start adapter 36 to operating 
software. A DUMP can be used at the discretion of SCSI 
host '40 to rettieve all or part of adapter 36 memory. 
EHagnostics should be run at power-up or reset Adapter 36 
includes firmware commands to query for previous fatal 
error status to determine if dun^ is required before down- 
load and to dowrdoad/update certain firmware components; 
in particular, logic for channel controller 64. 
Functional layers— control patii 
A control path for adapter 36 conqnises a sub-path of 
Admin, and is used for sending configuration and control 
commands and data, and for retrieving status information for 
adapter 36. Adapter 36 can be stopped, via the control path, 
in order to reload and/or reconfigure. Firmware Bus Inter- 
face services are available after a STOP is received, until 
adapter 36 has been restarted. 
Functional layers — logging and tracing 
Adj^ter 36 event messages flow over the logging sub- 
path of Admin, firom adapter 36 to the Admin Daemon on 
SCSI host 40. A separate tracing path (sub-path of Admin) 
can be provided to support a coniq>rehen$ive tracing capa- 
40 biiity in adapter 36. Trace configuration and start/stop com- 
mands can be sent to adapter 36 from SCSI host 40. Trace 
vectors can be sent from adapter 36 to SCSI host 40 for 
decode and display or recording in a trace log file. 
Functional layers — channel physical interface protocol 
A channel physical interface protocol (**CPIF*) con^>rises 
the protocol the LDH uses to communicate with channel 
physical ("CP"), Three control blocks, called channel con- 
trol areas ("CCA's")* can be passed back and forth between 
the LDH and CP and contain fields for command, response, 
and status codes. The control CCA can be used by the LDH 
to communicate Control Commands to CP and by CP to 
communicate responses to those commands back to the 
LDH. The command CCA can be used by the LDH to 
conmiunicate Interactive Comnmds to CP and by CP to 
communicate Interactive Responses back to the LDH. Inter- 
active Commands tell CP to interact with IBM System/360/ 
370 I/O interface chaimel 18. An asynchronous CCA is used 
by CP to notify the LDH that an asynchronous event has 
occurred on IBM System/360/370 I/O interface channel 18. 

The LDH passes CCIAs to CP by calling a **put" function 
that processes the CCA within the CP environment while 
still running as an LDH task Likewise, CP passes CCAs to 
the LDH by calling a **receive" function that was passed to 
CP by the LDH at initialization. The *'receive" function may 
be called within an interrupt service routine as a result of a 
channel event and then **wake up" the appropriate LDH task 
to process a CX^A. A more complete description of this 
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inteiface can be found in the OpenConnect Systems, Inc. 
MiteklBM Coimnunications ControUer Programming Inter- 
face (P/N 350-0142-101) document 
User interface — adapter front panel 
AdministratioD of adapter 36 can primarily be accom- 
plished using SCSI host 40. However, certain setup items are 
administered at ad^ter 36. Also, adapter 36 includes a 
capability of reporting status locally, without relying on 
SCSI host 40 or SCSI bus 38 being operational. 

Aside firom cabling and power, all local administration for 
adapter 36 can be peif onned through front panel 76. Admin- 
istration indudcs die reset switch, hardware status LED's, 
and the coniigurations/status subsystem The I^ED's are 
allocated as indicators fox power, halt, on-line, and 
Operational-Out status, and for CPU, IBM System/360/370 
I/O inteiface channel and SCSI bus activity indicators. 

The configuration/status subsystem is in^}lancnted with 
software and firmware con:q)onents that interact with a user 
througji two buttons on fix)nt panel 76 for **mcnu" and 
"select" functions, and a 2-iine by 16-character LCD dis- 
play. Internally, each line has an 80-character capacity. Long 
lines can be displayed as a marquee display; i,e., continu- 
ously scroUlng the message through the 16-character win- 
dow. Normally, the LCD display displays operational status 
and a last event log message for ad^ter 36. This is a default 
state which is reentered after a timeout interval of inactivity 
for front panel 76. 

In the illustrated embodiment, pushing the MENU button 
on front panel 76 activates a menuing system, displaying a 
top-level menu on the top display line widi the first menu 
item higjblighted, and the cursor in the first characta position 
of the top line. The bottom line displays data associated with 
the currently highlighted menu item, which is typically 
anotho* menu. If the top-level menu items do not fit within 
the 16-diaracter window, each subsequent depression of the 35 
MENU button would cause the menu line to scroll to the left 
by one item This in turn causes that item to be highlighted, 
and its associated data to be displayed on the bottom line. 
If the bottom display line is a second-level menu, its menu 


second program does not allow the SCSI ID or Device to be 
changed and only allows viewing of the cunent settings. 
User interface — administration from SCSI Host 
In general, adapter 36 can be administered from SCSI host 
40 with a ^'JNT-style configuration file, and commands 
issued through an Admin CUcnt to Admin Daemon 110. The 
configuration file provides a UNIX device-to-adapter path 
mappings download image path, autoload and autostart flags, 
and identifies a port for Admin Clients to connect up to 
Admin Daemon 110. Adapter 36 can be loaded and stated 
automatically by starting up Admin Daemon 110 with no 
arguments and the autoload and autostart flags set in the 
configuration file. Command line aigoments can allow alter- 
nate configuration file specification, or ovcnidc of indi- 
vidual configuration parameters. 

A character mode Admin Qient, worldng with Admin 
Daemon 110, can provide the user interface to ad^ter 36. In 
the illustrated embodiment, the following commands are 
supported: 
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• Dump 

duisp adapter 36 meincry 

Load 

download adapter 36 operalix^ 


software 

Start 

execute adapter 36 operatipg software 

Stpp 

stop execntion of adapter opcsaxxaz 


software and return to finnware loode 

Bugger 

invoke the finnware resident debugger 


mapped to stdin and stdout 

Diagm)stic5 

run selected adapter diagnostics 


download adaptw 36 confi£|uration 

Status 

retrieve and display adapter status 


and cosQguratioii 

Ikace 

execute specified trace on adapter 

Log 

display the event bg 


Software conq>onents — adapter finnware 

Firmware 104 is EEPROM-resident code that provides 
startup routines and low level procedures for intexfadng 
hardware components. 

An initialization and control program controls the flow of 


firmware 104 providing the main line of program execution 
items arc displayed wiih the first item on the line high- 40 from power-up or reset The initializatioD and control pro- 


lighted. Pushing the SELECT button moves the cursor to the 
bottom line with the first item "selected". Subsequent 
depressions of the SELECT button cause the cursor to move 
to the next item if selections fit completely within viewing 
area; otherwise, second-level menu items to rotate left by 
one item, and for that item to become "selected'*. A 
"selected" second-level menu item is activated by depress- 
ing the MENU button. This action also causes the cursor to 
return to the previous menu. Once a selection has been 
activated, it remains highlighted and in the first position each 
time that particular second-level menu is displayed, until a 
new selection has been activated. An escape C*ESC") option 
on tht second-level menu can be offered to allow return to 
the previous menu without activating a new selection. 

Additional menu nesting can be supported. If a seconds 
level menu item represents another menu heading, then 
activating that item causes the top line to display the entire 
'trranch" of the currently selected menu items in the ''tree", 
from the top level to the current, and the new menu to 
display on the bottom line. There is no necessity for limithig 
the level of nesting that can be achieved using this tech- 
nique. For menus nested beyond two levels, a TOP option 
can be offered to allow direct return to the top-level menu. 

In the illustrated embodiment, two menu programs are 
defined. The fiKt runs from the iirmwarc before download. 
Hie second runs after ad^ter 36 is downloaded and started. 
Hiese menu programs can be identical, except that the 


45 


gram is responsible for preliminary initialization, dispatch- 
ing Power-On Self Tests, hardware initialization, installation 
of inteaiq>t vectors and other firmware con^nents, and 
initialization of global variables. Tlie firmware control pro- 
gram also sets up a vector table that provides soft linkage to 
hardware and software con^wnents. This allows die down- 
loaded software to access hardware con^onents and firm- 
ware entry points without needing hardcoded addresses. 
Apower-on setf test (**POST") program operates to detect 
50 the power-on condition and run self tests to verify that 
hardware components of adapter 36 are functioning prop- 
erly. 

The debugger, or **bugger*' for short, is a firmware com- 
ponent with facilities for examining and modifying memory 
and CPU registers, single-stqyping, breakpoints, assembler, 
disassembler, and other such functions. There are additional 
facilities for the examination of local component traces for 
channel and SCSI interfaces, and fac running diagnostics. A 
default debug console is a dumb ASCII tenmnal attached to 
a serial port of adapter 36. The bugger runs at intetnipt level, 
driven by keystrokes from the serial port. A &mwarc 
command allows the bugger to switch over from the serial 
port to SCSI bus 38 for its input and ou^ut. 

A bus interface services firmware component provides 
administrative and debug services to SCSI host 40 as 
mentioned above. These services include load and dump 
capability, as well as access to the bugger and diagnostics. 
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The bus interface services are started by the firmware 
control program after all initialization is complete. SCSI 
devices LUN 0 and 1 are initialized as read and write paths 
for the bus interface services. The Admin Control conq)0- 
nent of adapter 36 operating software can later take over 5 
LUN's 0 and 1. 

A firmware services component provides services to 
adapter 36 operating software, or other firmware 
components, to insulate it from the hardware. Included are 
drivers for the serial port and front panel 76. The firmware lo 
services arc accessed via trap instruction and processor 
registers. A serial pert driver aUows software components to 
display messages to the debug console, A front panel driva 
includes a low level and a high level interface. The low level 
interface allows software components to directly write mes- 15 
sages to either line of the LCD display and to independently 
read and^or reset stored button states. Additionally, the low 
level interface can allow installation of service routines for 
each button. The high level interface inq)lements the menu 
system described above. 20 

The firmware control program installs a menu iHX)gram to 
handle the front panel before operating software is down- 
loaded. This program is primarily conccnxcd with configu- 
ration parameters to operate the SCSI interface. 

SCSI transport service 100 comprises a firmware com- 25 
ponent that provides the coimnunicattons pipe between 
adapter 36 and SCSI host 40. This component is later 
replaced by a downloaded software con^nent by the same 
name. SCSI transport service 100 is composed of the three 
con^nents, a SCSI interface, t^e emulation and SCSI 30 
transport API whidi described in more detail bdow. 

Software con^nents — adapter operating Software 

Operating software is downloaded to adapter 36 over 
SCSI bus 38 for the purpose of iiiq)lementing channel 
devices. In the illustrated embodiment, software for imple- 35 
mentation of 3274 devices comprises OC SERVER II and 
OC/GTO software available from OPENCONNECT® SYS- 
TEMS located in Dallas, Tex. This downloaded software 
utilizes OpcnConnect Systems, Inc.'s Mitek Operating Sys- 
tem C*MOS**) for dispatch and interprocess communication 40 
services in a multitasking environment. 

SCSI transport service 100 comprises a software compo- 
nent that provides the conamunicationspipe between adapter 
36-resident applications and SCSI host 40. This service is 
cdginally resident in firmware and is subsequently replaced 45 
with a downloaded version, SCSI transport service 100 
conq)rises diree con^nents as described below. 

The first is a SCSI interface and is also referred to as a 
SCSI protocol handler. The SCSI interface works with SCSI 
and DMA controllers in hardware to implement the SCSI so 
logical functionality described above. The SCSI interface 
provides services to the tape emulation component through 
a defined interface. The SCSI interface comprises interrupt 
routines to provide real-time response to S(il bus 38, and 
SCSI primitive functions for the tape emulation layer. The 55 
SCSI interface can operate as either a SCSI-1 or SCSI-II 
target 

At^e emulation con^nent, which is also refened to as 
the SCSI device handler, emulates eight indq>endent tape 
devices (LUN 0-7) as described above. Each tape device 60 
defines itself to SCSI host 40 as an ANSI standard half-inch 
nine-track SCSI t^e controller. Each device constitutes an 
independent logical unit with separate state logic and work 
area to achieve and handle low-level logical functions. 

The third con?)onent is a SCSI transport API and provides 65 
asynchronous, message-oriented Read/Write services to 
adapter 36-resident applications. The SCSI transport API 
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coiiqx)nent relies on die tape emulation layer to filter out all 
SCSI activity other than Reads, Writes and Resets. 
Application-provided routines are dispatched for notifica- 
tion of certain events, such as input/output ("I/O") conq)le- 
tion and reset 

Adapter control task 102 comprises a MOS task respon- 
sible for interfacing the channel devices and administrative 
services to SCSI host 40, and for any multiplexing that may 
be required to inq)lement the interface. Eadi device type 
(LDH 3274, LDH GrTO) has an interface/manager compo- 
nent residing in adapter control task 102, with MOS path 
generated to each potential device. The manager con^x>- 
nents utilize SCSI transport service 100, according to its 
API, to read and write SCSI bus 38, and convert intern^ 
events to MOS queue events. 

Adapter control task 102 is also home to Admin control 
and all the administrattve functions in adapter 36 tiiat are not 
provided by firmware. Admin control also uses the SCSI 
transport service 100 for accessing SCSI bus 38. 
Additionally, a path management service resides in ads^ter 
control task 102 to listen for incoming PATH OPEN com- 
mands so that an appropriate LDH Manager can be assigned 
to a given path. The main line of execution in adapter control 
task 102 is a dispatch function that responds to MOS queue 
events to dispatch the ^propriate adapter 36 application 
(such as LDH Manager or Admin). 

A MGR3274 component iiiq)lenients die multiplexing 
protocol described above wifli respect to LDH 3274 channel 
device interface. MOS paths are generated to eadi generated 
LDH3274 taskf<ff routing ITEM'S between the multiplexer 
and each LDH. Routing is based on ad^er 36 logical path 
and the sub-path ID field and command code in adapter 36 
Message Header. The routing data is created and managed 
by MGR3274 in response to path assignments from the path 
manager, and from LDH OPEN and CLOSE commands 
from SCSI host 40 ^plications received ova these paths. 

MGR3274 can be assigned up to three full-duplex paths 
(LUN's 2-7), for communicating with SNA processes on 
SCSI host 40. A buffer pool is allocated for each inbound 
path to receive transmissions for all LDH's associated with 
the path. A flow control protocol is executed over eadi path 
to achieve logical independence of buffer resources among 
LDH*s. This protocol is also executed over each outbound 
path to aUow the multiplex process running on SCSI host 40 
to control its buffer usage. 

A CjTO-LDH interface provides die LDH interface com- 
ponent described above with respect to the GTO-LDH 
channel device interface. The GTO-LDH interface is 
assigned a full-duplex logical path from the Path Manager in 
response to a PATH OPEN command from MAMTCP 
running on SCSI host 40. This path is used for communi- 
cating status and data across SCSI bus 38; one for the 
inbound LDH and one for the outbound LDH. A buffer pool 
is used for receiving inbound transmissions. 

Upon initial entry, Admin Control takes over control of 
LUN* s 0 and 1 from the firmware ("F/W*') bus interface. The 
F/W bus interface then becomes one of several end points 
for routing messages that flow over adapter 36 administra- 
tive path. Configuration and status messages can be handled 
directly by Admin Control. 

Ad4>ter control task 102 includes a logging manager and 
a trace manager as referenced above. The logging manager 
writes in^wrtant system events to front panel 76 and to the 
logging daemon on SCSI host 40. 

LDH3274 is a channel device that emulates the IBM 
3274-41A Control Unit Implemented as a MOS task, each 
generated LDH3274 manages the connection and flow of 
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SNA data between mainframe 16 and one PU configured in OCSNA MUX process, SCACPI coexists with oflier data 

SCSI host 40. The services of channel physical are used to link con^onents in the data Hnk task by utilizing a new 

access the Input/Ouq)ut CW) sub-channel addresses of multipledng layer that interfaces the NUC. SCACPI may be 

mainframe 16, MGR3274 in adapter 36 ad^tcr control task iii^)lemented as a singlle data Unk task. 

102 is used to present the anticipated interface. 5 The OCSNA MUX process is a separate UNIX® process 

LDH-GTO uses separate sub-channel addresses for responsible for multiplexing LDH3274 traffic from multiple 

inbound and outbound data transfers which are handled by OCSNA processes over a pair of SCSI tape devices. The 

the inbound LDH (Idhbilly) and the outbound LDH OCSNAMUX process is used for systems running mare 

(Idhnanny). LDH-GTO devices implement a channel than three concurrent OCSNA processes through ad^ter 36 

protocol, based on OpenConnect Systems, Inc. Mitek lO (or CTTO and more than two concurrent OCSNA processes). 

Access Method (*'MAM**)» for achieving high-^ecd data An API utilizing shared memory and PIC is defined to 

transfers. The services of channel physical are used to access interface tiie OCSNA processes 112 and 114. The multiplcx- 

the Input/Output ("FO") sub-channel addresses of main- ing protocol described above with respect to the LDH 3274 

frame 16. The LDH-GTO interface component in adapter 36 channel device interface is used to access MGR3274 in 

adapter control task 102 presents the anticipated interface. 15 adapter 36. 

Oiannel interface MAMTCP is a SCSI host 40 workstation appUcation that 

Channel interface 106 includes a Channel Physical v/tdch provides "sockets" access to mainframe q)plications. 

comprises a manager of the IBM Systcmy360/370 I/O inter- Communication is achieved using reads and writes over 

face channel input/output ("I/O") communications for the full-duplex path provided by the adapter API library, 

adapta- 36. Channel Physical acts as the intermediary 20 Adapter 36 Message Header is utilized to multiplex input/ 

between each LDH (e.g,, LDH3274 and LDHGTO) and output and control functions with data using only the read 

IBM System/360/370 1/0 interface channel 18. Channel and write functions. 

Physicfid manages input from the LDH'St schedules work on The Admin Daemon 110 is a UNIX® process that pro- 
the channel, and transfers requests and data between the vides the path for all adapter 36 administration from SCSI 
LDH*s and IBM Systcm/360/370 I/O interface channel 18. 23 host 40 using the ad^ter APL Several administrative ftino- 
Communications between IBM System/360/370 I/O tions can be provided directly by Admin Daemon 110, 
interface channel 18 and Channel Physical occurs at adapter including load, start and event logging. Admin Daemon 110 
36 channel hardware interface via interrupts and registers. parses command Hne arguments and a configuratioo file for 
When a channel event occurs, an intemipt invokes a Chan- its base ii^ut In addition, Admin Client sessions are sup- 
nel Physical interrupt service routine which in turn reads 30 ported over socket connections for initiating additional 
channel hardware status registers to determine the event and administrative operations. Admin Daemon 110 also provides 
appropriate action. Channel Physical initiates activity on the a SNMP agent con^onent to support certain administrative 
channel by writing to the appropriate channel hardware functtons from a SNMP manager, 
registers. Hie Channel Physical interface to IBM System/ Admin clients connect to Admin Daemon 110 over socket 
360/370 I/O interface ciiannel 18 con^lies with the speci- 35 cormecdons for manual or interactive adapter 36 adminis- 
fications in 'IBM® System/360 and System/370 I/O inter- tration. Admin Daemon 110 provides an ASCII test corn- 
face channel to Control Unit Original Equipment mand set This allows a generic Admin client to be devel- 
Manufacturers' Information**, GA22-6974-08, File No. oped that supports all Admin operations by singly m^ing 
S360/S370-19- Communications between Channel Physical STDIN and STDOUT to Admin Daemon 110 connection. In 
and the LDH*s are achieved tfarou^^ the use of channel 40 addition, this client could provide a GUI interface to modify 
I^ysical interface protocol (*'CPIP*) as mentioned above. the configuration file. Shell scdpts invoking the generic 
Software components — SCSI host Software Admin Client can be developed to provide individual Admin 
SCSIt^e drivers on SCSI host 40 provide access to SCSI utility functions such as 'load** and "duinp". 
bus 38 for applications using adapter 36 as described above. Buffers and Data Flow 

Ibesc drivers are supplied by the operating system with 45 With respect to outbound SNA data flow, on adapter 36, 

platform-dependent installation requirements. The adapter each LDH3274 allocates a pool of buffers for receiving 

API library is used by UNDC® applications to interface to ouftound data from IBM System/360/370 I/O interface 

adapter 36. This library includes calls for creation and channel 18. As outbound data is received into ^ese buffers, 

deletion of a fuU-duplex path to adapter 36, as well as calls they are sent to MGR3274 to be sent across SCSI bus 38. 

for passing data to and from ad^tcr 36. so They are held up in MGR3274 until a READ has been 

Since UNIX systems, in general, do not support the received fix)m the outbound tape device, and Ihe flow control 

SELECT function for tape devices and do not honor non- mechanism indicates that the corresponding PU can receive 

blocking requests, the adapter API library can need to fork the data. At this time, the data can be sent across SCSI bus 

separate processes for the read and the write path and utilize 38 and the huffier is subsequently released to the LDH. If the 

shared memory for passing data. Since pq)cs support 55 LDH runs out of buffers, the channel enters a "slowdown" 

SELECT, they may be utilized for control Pipes may also be mode. 

considered for passing data, since the implementation can be On SCSI host 40 system, SCACPI in OCSNA process 112 

much simpler than a shared memory iaq)l^nentation. or 114 allocates a single pool of buffers to receive transmis- 

Thc SCACPI in first OCSNA process 112 and in second sions from adapter 36 outi)ound to all PU's in OCSNA 

OCSNA process 114 is the data link component and is 60 process 112 or 114. SCACPI tries continuously to read the 

responsible for interfacing adapts 36 and multiplexing PU*s outbound tape device, utilizing the "select" function to avoid 

over a fiiU-duplex path to adapter 36. The SCACPI implc- blocking. As outbound data is received, the buffers are 

ments multiplexing protocol and is the counterpart to routed to the appropriate PU, and eventually released back 

MGR3274 in adapter 36. For SCSI host 40 systems with to SCACPL A flow control mechanism between SCACPI in 

three or fewer concurrent OCSNA processes, SCACPI can 65 OCSNA process 112 or 114 and MGR3274 on adapter 36 

utilize the adapta API library for access to ad^ter 36. For prevents any single PU from using more than its share of 

large systems, SCACPI can be configured to go through an buffers. 
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With respect to inbouad SNA data, on SCSI host 40, FIGS. 5B-l,5B-2,5B-3 and 5B-4 axe schematics showing 

SCACFIwitfain OCSNA process 112 or 114 receives bufifers process^ 60 of FIG. 3 including a number of devices 

of inbound data ftom PlTs. The source plication and interconnected as shown. Ftocessor «0 comprises a 68360 

ownerof thebuffercanbcirrelevant ff the inbound path can QUICC, which is a MOTOROLA® 32-bit processor, and 

be written without blocking, and the flow control media- 5 functions as the central processing unit for adapta 36. 

nism todicates that the corresponding LDH can receive the Device 120 comprises a 25 MHz crystal osdlSor and 

^J?^^ "^iJ^??^,.^??:!^/ operates to provide timing signals. Device 122 comprises an 

ttai^tted over SCSI bus 3a The buffer is then rdeased ^„ additionXpabfl- 

^!l^Ti,°^^.^r,^'^^^"f°^J^'^^"'^^^°' ^ tte central pocessor. Device 124 comp^es an 

handling the case of inbound buffer depletioii. oonoo • * a • i * ^ 

On adapter 36, MGR3274 allocates a single pool of chip and provides signal convasionfr^ 

buffers for all the LDH's associated with a given OCSNA pMs/nunus 12 volt signals to standard TTL CMOS 

process 112 or 114. As long as buffers are available, signals which arc used by processor dO. 

MGR3274 atXjcmpts to read &e inbound path for that HGS. 5C-l,5C-2 are schematics showing memory fadli- 

OCSNAprocess 112 or 114. As inbound data is received, the ^^ed by processor 60 including a number of devices 

buffers arc routed to the q)prop!riate LDH for transmission interconnected as shown. Devices 126 conmrise static ran- 

over IBM System/360/370 I/O interface channel 18, and access memca^ ch^s, each holding 32-bitsxl28 K for 

eventually released back to MGR3274. a total of 512 kilobytes of memwy storage space. Device 

The same flow control mechanism used for outbound 128 comprises a single inline memory slot which can hold 

transmissions is used for inbound transmissions, preventing from 4 megabytes to 64 megabytes of dynamic random 

any single LDH from using more than its share of buffers. 20 access memory. Device 130 comprises a flash EEPROM, 

For both directions of flow, adapter 36 Message Protocol and device 132 comprises a driver circuit for device 130. 

allows transmissions to be blocked-up over SCSI bus 38 for Device 130 is operable to store debugger code, initialization 

improved perfcamance. code and various other boot code. Device 130 also maintains 

With respect to file transfer data, on adapta 36, the the XYIJNX® image used at power-up to program channel 

GTO-LDH interface con^nent in adapter control task 102 25 controller 64 which conq)rises an XC4006-6 programmable 

allocates all the buffers, OTO-LDH interface keeps some for gate array from XTT.TNX®. 

inbound transmissions and passes some for outbound trans- FIG. 5D is a schematic showing SCSI controller 62 of 

missions. On SCSI host 40, MAMTCP allocates a buffer FIG. 3 including a number of devices interconnected as 

pool to be used for both channel and TCP/IP. shown. SCSI controller 62 provides an interface between 

Outbound data is received by Idhnanny who passes the 30 processor 60 and SCSI bus 38. In the illustrated 

buffer to GTO-LDH interface. Outbound buffers are held up embodiment, SCSI controller 62 comprises an NCR 

here until a READ has been received from flie outbound tzpQ 35CF96-2. 

device. At this time, the data can be sent across SCSI bus 38 FIGS. 5E-1, 5E-2 are schematics showing a number of 

and the buffer subsequently released to the LDH. If the LDH devices interconnected as shown. A boundary scan connec- 

runs out of buffers, the channel enters a "slowdown" mode. 35 tor 134 is shown and is operable to provide testing. A reset 

As long as buffers are available, MAMTCP on SCSI host 40 connector 136 is shown and operates as an internal reset 

attempts to continuously read the outbound path, utilizing generator used f<H- testing during manufacturing. An LCD 

the '*select" function to avoid blocking. interface 138 is shown and operates to drive the LCD di^lay 

MAMTCP writes inbound data to the inbound path when of front panel 76. A front panel interface 140 is shown and 

the "select" function indicates that blocking can not occur, 40 opaates to provide a connector to the buttons and LED* s of 

causing data to be transmitted over SCSI bus 38. As long as front panel 76. Device 142 conqiriscs an LED driver. Device 

buffers are available, GTO-LDH interface attempts to read 144 comprises a four-bit counter. Device 146 comprises an 

the inbound path. As inbound data is received, the buffers are input/output (*T/0**) controUcn Together device 144 and 

passed for transmission over the channel, and eventually device 146 operate as a state machine for the LCD and idle 

released back to GTO-LDH interface. 45 LED on the front panel 76. 

A single SCSI host 40 may require more than one adajrter FIGS. 5F-1, 5F^2 are schematics showing channel con- 

36 to address performance issues, fault tolerance, or to troUer64of FIG. 3 including a number of devices intercoo- 

connect to multiple mainframes. This can be accomplished nected as shown. FIGS. 6A-1, 6A-2, 6B, 6C-1, 6C-2, 6D-1 

by setting up SCSI host 40 with additional t^ devices at a 6D-2, 6E-1, 6E-2, 6F-1 6F-2, 6G-1, 6G-2, 6H, 61-1, 61-2, 

different ta^et address. This is platform dependent. SCSI 5o 6M, 6J-2, 6K-1, 6K-2, 6L-1, 61^2, 6M-1, 6M-2, 6M-3, 

host 40 applications are started with the appropriate device 6M-4, 6N-1, 6N-2, and 60-1, 60-2, as described below, are 

names to direct them to the correct target adapter and schematics showing the internal logic components and inter- 

LUN's. The only special requirements for handling more connections of channel controller 64. Channel controller 64 

than one adapter 36 are associated with Admin Daemon 110 comprises a programmable gate array from XHJNX®. 

and its configuration file. An Admin Daemon 110 should be 55 Device 148 comprises a 256x4-bit SRAM used for address 

started for each attached adaptor 36 with a command line decoding and channel selection addressing. Device 150 

parameter specifying its unique configuration file. Each coiiq)rises an 18 MHz crystal 

configuration file should specify the device mapping for FIG. 5G is a schematic showing the structure of bus 

adapter 36, and j^^ovide a unique port address for client interface 66 of FIG. 3. Device 152 comprises a parity 

connections to Admin Daemon 110. 60 generator. Device 154 comprises a 10-bit register. Devices 

Schematics for adapter 156 comprise tiistate registers operable to control whether or 

HGS. 5A, 5B-1, 5B-2, 5B-3, 5B-4, 5C-1, 5C-2, 5D, 5E-1, not data is going out bus interface 66 or data is looped back 

5E-Z 5F-1, 5F-2, 5G, 5H-1, 5H-2, 5H-3, 5L 5M, 5J-2, 5J-3 for testing purposes. Device 158 comprises a parity genera- 

and 5K are schematics showing one embodiment of the tor that is used to check for bus-out parity. Devices 160 

components and interconnections for adapter 36 of FIGS. 2 65 comprise physical signal drivers going out to IBM System/ 

through 4. FIG. 5A shows a list of building materials and a 360^370 1/0 interface channel 18 and operating physically to 

cross-reference chart drive IBM System/360/370 I/O interface channel bus 18. 
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FIGS. 5H-1, 5H-2, 5H-3 are schemadcs showing tag whidi indicates there should be an intcnupt after the next 

inteiface 69 of FIG. 3 induding a number of devices sequence ofoperations is coi]q}leted Device 212 is an online 

interconnected as shown. In general, FIG. 3 bus interface 66 bit which tells the rest of the system that the controller is 

provides a data interface between channel controller 64 and on-line. Command register 214 and 216 store similar com- 

IBM System^60/370 1/0 interface diannel 18, and tag 5 mand bits. 

interface 68 provides control signals between channel con- FIGS. 6C-1, 6C-2 are schematics showing the selection 

troUer 64 and IBM System/360/370 1/0 inteiface channel controller including a number of logic components inter- 

18. Device 162 con^ses a voltage reference device used to connected as shown. Hie selection controller is used in two 

generate a FRST signal and a reset hard signal Devices 164 different circumstances. The first circumstance is during the 

comprise physical drivers for the tag interface bus. Device lo poll situation where some entity is requesting use of IBM 

166 comprises a loop-back generator that is used for testing Syst&m/360/370 I/O interface chaimel 18. The selection can 

similar to that described with respect to bus interface 66. be characterized as mainframe 16 alerting processor 60 that 

Devices 168 conpise receivers which receive control sig- some action needs to be taken. The selection controller is 

nals from IBM System/360/370 1/0 interface channel 18. also used in the circumstance where the selection is not 

Device 170, device 172 and device 174 arc operable for reset 15 intended for processor 60, and channel controller 64 fonc- 

timing. Device 170 is a programmable co2t^)onent which is tions to propagate two signals which comprise the selection 

operable for selection control forrouting signals to andfirom down IBM System/360/370 1/0 interface channel 18. The 

IBM System/360/370 3/0 interface channel 18. Device 176 selection controller is also used to create a busy signal where 

comprises a selection circuit which functions in the event of a device can tell mainframe 16 via IBM System/360/370 1/0 

power loss to replace device 170 and device 164 with two 20 interface dianncl 18 that the device is busy. The busy signal 

hardware relays. is created, stored and generated from device 218 of FIGS. 

FIGS. 51 and 5J-1, 5J-2, 5J-3 are sdiematics showing 6C-1, 6C-2. 

SCSI connector 65 and channel connector 69 of FIG. 3 FIGS. 6D-1, 6D-2 are schematics showing the tag 

including a number of devices interconnected as shown. sequencer induding a number of logic components inter- 

FIGS. 5J-1, 5J-2, 5J-3 show a connector board that is 25 connected as shown. The tag sequencer operates to control 

connected to the main board through a connector shown in the sequence of signalling that hq)pens over tag interface 68. 

no. 51 The left side of the FIG. 5J-1 shows the bus and tag Tag interface 68 provides an interface that passes control 

connector. Device 180 comprises an interface to device 178 signals between mainframe 16 and processor 60. The dian- 

of ¥10. 5L Devices 182 course connectors to SCSI bus nel protocol requires particular signals to go.high and low at 

38. Devices 184 compise two relays which are driven by 30 various times within the passing of control information, 

device 176 and are used for selection. FIGS. 6D-1, 6D-2 are schematics showing one embodiment 

FIG. 5K is a schematic showing a number of components of a hardware solution to tag sequencing, 

and interconnections. Device 186 provides power to the FIGS, 6E-1, 6E-2 are schematics showing the start and 

main board. Device 188 is a power switch. The comibination termination controller induding a number of logic compo- 

cf the c^adtive array 190 and the beads 192 construct an 35 nents interconnected as shown. The start and termination 

LC network. The large number of capadtors are by-pass controller provides intermediate states, especially the start 

capacitors which supply clean voltages for the channd state which is between the selection from main£rame 16 and 

controDer 64 and a phase lock loop inside processor 60. The the beginning of data transfer: The start and termination 

large number of resistors 194 are all pull-up resistors and the controller also provides similar signalling jjrior to termina- 

components 196 are spares. 40 tion of the entire order with the termination signals. 

Schematics for channd controller FIGS. 6F-1, 6F-2 arc schematics showing the byte clocks 

FIGS. 6A-1, 6A-2, 6B, 6C-1, 6C-2, 6D-1, 6D-2, 6E-1, and "go" tag sequencer induding a number of logic com- 

6E-2, 6F-1, 6F-2, 6G-1, 6G-2, 6H, 6H» 61-2, 6J-1, 6J-2, ponents interconnected as shown. The byte clock is used 

6K-1, 6K-2, 6L-1, 61^2, 6M-1, 6M-2, 6M-3, 6M-4, 6N-1, internally to physically request IBM System/360/370 1^0 

6N-2, and 60-1, 60-2 are schematics showing one embodi- 4S interface channel 18 to transfer data to processor 60 or to 

ment of logic con^onents and interconnections for channd transfer data to channel 18 &om processor 60 one byte at a 

controller 64. time. The go tag signal occurs between the start process and 

FIGS. 6A-1, 6A-2 are schematics showing address decode the termination process and remains on while data transfer 

induding a number of logic components interconnected as is taking place. The upper righthand comer of FIG. 6F-2 is 

shown. Devices 200 of FIG. 6A conpise an inputs of 50 a dock divider for the 18 MHz crystal signal 

channel controller 64. Channel controller 64 conoprises a FIG. 6G-1, 6G-2 are schematics showing tiie data transfer 

XELIKX®XC4006'6 which is a RAM-based field program- controller including a numbor of logic components inter- 

mable gate array having an SRAM on board. At power-up, connected as shown. The data transfer controller operates 

an image is brought from programmable ROM 74 and under two different types of data transfer. One type is 

loaded into SRAM on board channel controller 64. Devices 5S synchronous, and the other type is asynchronous. The data- 

202 of FIGS. 6A-1, 6A-2 are schematic outputs of channd in and service-in lines and the service-out and data-out lines 

controller 64. Channd controller 64 has bi-directional pins are used in both types of data transfer. In synchronous data 

as well. The address decode logic shown in FIGS. 6A-1, transfer, which is referred to as data streaming, the data-in 

6A-2 indude a device 204 which con^irises a state machine. and service-in lines are used altcmativdy to transfer data 
Generally, address decode takes a 256 byte address and 60 into IBM System/360/370 I/O interface diannel 18. At some 

separates the address into smaller width addresses, typically point after that process begins, the service-out and data-out 

an 8-bit, 16-bit or 32-bit address. lines are actuated by mainframe 16 to indicate that syncfaro- 

FIG. 6B is a schematic showing command registers nous data transfer was received. In the asynchronous mode 

induding a nomber of logic components interconnected as of transfer, the data-in and service-in lines are used to 

shown. Register 206 is used to control loop back diagnos- 6S request a data transfer, and the service-out and data-out lines 

tics. Conmiand register 208 is written only once at initial- are then used to acknowledge the request. The propagation 
ization. For example, device 210 has the interrupt allow bit delay in IBM System/360/370 1/0 inteiface diannel 18, 
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itself, actually reduces the transfer rate in the asyndironous The channel positioning controller logic 236 controls which 

mode because [rocessor 60 must wait for acknowledgement of the input registers 230 is being loaded and whidi of the 

on tiie service-out and data-out lines before data transfer can output muxes are being accessed for output to IBM System/ 

take place. 360/370 I/O interface channel 18. 

FIG. 6H is a schematic showing the cirorinteirupt control 5 F^GS. 6N-1, 6N-2 are schematics showing the FIFO 

including a number of logic con^wnents interconnected as push-pop controner including a number of logic components 

shown. The erroi interrupt control operates to control three interconnected as shown. This circuit creates control signals 

conditions from mainframe 16: resets, selective reset and "^^^^^ ^^^*°* ^ not data is being pushed into the 

interface disconnect Error interrupt control also has three f^P^^ ^^9 that control in what 

internally generatedcirorconditions.Thefirstoneisaparity 10 ^^^^^ ^ . 

error wMdi is a standard check on data If the dl^ k "® sdicmaUcs showmg the FIFO 

error wmch is a standard check on data. It me data w ^^^^^^ ^ controller system including a number of 

cocraptedforsomereason,thedatatr^^^^ logic components interconnected as shown. The center part 

and rc^tortcd. AnotiiCT error is a ^tch temsition that is an j^q^^ ^q-I, 60-2 arc the push address counter and the 

online/oflame switdi. If this switdi has been actuated, pro- ^^^^^ counter which is the address control for tiic 

ccssor 60 deadcs whether or not to go online or whetha- to i5 pipQ. xhe FIFO flag controller at the bottom of HGS. 60-1, 

go oflQine based on tiie current status so that data transfer in 5Q.2 determine whether or not the FIFO is empty or fulL 

process is not interrupted. Data streaming/time out is a Direct memory access ("DMA") request is a line that is 

counter that times the interval between responses from pulled by channel controller 64 any time channel controller 

mainframe 16 to ensure that if IBM System/36CV370 VO 64 needs data or has data to be placed in memory using a 

interface channel 18 is inactive for too long, processor 16 20 DMA operation, 

disconnects to ensure IBM System/360/370 I/O interface Overview 

channel 18 is not reserved by a nonfunctioning entity. An ads^ter for interfacing a SCSI bus with an IBM 

FIGS. 61-1, 61-2 are schematics showing the statos inter- System/360/370 I/O interface channel constructed according 

rupt controller including a number of logic con^ncnts to the teachings of the present invention provides bidirec" 

interconnected as shown. One function on the righthand side 25 tional communication at high data bandwidth to take advan- 

ofFIG.6Iisariiysicalinterruptsignalthatgoestoproces5or ^he bandwidth of the IBM Systcm/360/370 1^0 

60. The status interrupt controller generates a variety of interface channel. Because of the wide availability of SCSI 

status signals which are used by processor 60, device ports on computer workstations and personal 

FIGS. 6J-1, 6J-2 and 6K-1, 6K-2 are schanatics showing computers, the adapter of the present invention benefits 

the processor data mult?)lexa:s providing an mtcrfacc 30 flunierous information systems currently used by organiza- 

between the 32-bit data bus that connects processor 60 and ^^^^ include both SNA and TCP/IP environments, 

channel controller 64 including a number of logic conq)o- ^n adapter constructed according to the teachings of the 

nents interconnected as shown. The processor data multi- present invention operates to provide numerous functional 

plexers operate both for input and oatpat of data on that bus. advantages. Normal DC-interlocked and high-^d transfer 

FIGS. 6L-1, 6L-2 are schematics showing the bus-in and 35 features of the IBM System/360/370 I/O intofacc channel 

bus-outfor&econnectionofdatalinesfix)mbusinterface66 ^an be supported. Data streaming features of the IBM 

to channel controUcr 64 including a number of logic com- System/360/370 I/O interface channel can also be supported 

ponents interconnected as shown. Bus-in on the right side of ^ bandwiddi of data 

HGS. 6L-1, 6L-2 arc schematics coir^jrising the output of streaming on the IBM Systcm/3(30/370 I/O interface chan- 

channel controller 64 to mainframe 16. Bus-out comprises 40 Th^ ad^ta operates such fliat arbitrary limits to the 

the data coming out of Tna^nfrfttnft 16 and going into channel number of concurrent sessions are not imposed. The ad^ter 

controller 64. Devices 220 comprise an address multiplexer flexible to allow adaptation to inq)lement additional 

for the 256x4 SRAM shown as device 148 of FIGS. 5F-1, channel protocols. 

5F-2. Devices 220 con^oise address modes for the SRAM, Although the present invention has been described in 

and card 0, 1, 2, 3 con^ses the data mux for the SRAM. 45 should be understood that various changes, substi- 

HGS. 6M-1, 6M-2, 6M-3, 6M-4 are schematics showhig ^^ons and alterations can be made hereto without departing 

a data interchange and FIFO system including a number of ^°P® °f invention as defined by the 

logic conqwnents interconnected as shown. A data path 222 appended claims, 

from IBM Sy5tem/360/370 I/O intaface channel 18 is What is claimed is: 

shown in the upper lefthand comer of FIG. 6M-1. Below 50 1- processing system including a plurality of 

tiiat,adatapath224fromproccssor64isshown.Adatapatii computer systems, the data processing system comprising: 

226 to processa 64 is shown in the upper righthand comer, ^ fi^st con^xiter system having an IBM System/360/370 

and a data path 227 to the diannel is shown on the lower 3/0 interface channel, the first computer system opcr- 

right side of FIGS. 6M-2, and 6M-4. Data is received from able to communicate SNA and non-SNA protocol inf or- 

processor 64 in a 32-hit wide data path and is received from 55 mation via the IBM Systcm/360/370 I/O interface 

IBM Systcm/36a^370 I/O interface channel 18 in an 8-bit channel; 

wide data path. The data is fint placed in a parallel array 228 a second coit^uter system having a SCSI bus, the second 

of data input and multiplexers used to load input registers computer system operable to communicate SCSI pro- 

230 in the center of FIG. 6M-1. The input registers 230 are tocol information via the SCSI bus; and 

loaded in parallel if the data width is 32-bits or sequentially 60 an adapter coupled to the IBM System/3 60/370 I/O inter- 

if tiie data width is 8 bits. The input registers 230 are then face channel of the first computer system and the SCSI 

used to load an SRAM 232. The SRAM 232 is addressed in bus of the second compute system, the adapter opcr- 

a FIFO configuration to load the output registers 234. The able to interface the SCSI bus with the IBM System/ 

ou^ut registers 234 are then routed either to processor 64 or 360/370 1/0 interface channel to allow bidirectional 

to IBM System/360/370 I/O interface channel 18 depending 65 communication between the IBM SystenQ/360/370 I/O 

upon the desired destination. At the bottom of FIG. 6M is interface channel of the first computer system and the 

shown logic 236 which is the c^iannel positioning controller: SCSI bus of die second computer system. 
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2. The data processing system of claim 1, wherein the 
adapter is fuither operable to define a pluiality of logical 
units representing SCSI devices for communicating infor- 
mation via the SCSI bus. 

3. The data processing system of daim 2, wherein the 5 
plurality of logical units represent tape devices. 

4. The data processing system of claim 1, wherein the 
adapter comprises: 

a SCSI interface unit, the SCSI interface unit coupled to 

the SCSI bus; lo 
a channel interface unit, the channel interface unit coupled 

to the IBM System/360/370 I/O interface channel; and 
a processor coupled to the SCSI interface unit and to the 

channel interface unit, the processor operable to control ^ ^ 

operation of the adapter. 

5. The data processing system of daim 4, wherdn the 
SCSI interface unit con^rises a dedicated SCSI controller. 

6. The data processing system of daim 4, wherein the 
channel intrface unit comprises a programmable logic 
device. 

7. The data processing system of claim 4, wherein the 
processor comprises a 32-bit microprocessor. 

8. Tbe data processing system of claim 1, wherein the first 
con^uter system comprises an IBM or compatible main- ^5 
frame con^)uter system. 

9. The data processing system of claim 1, wherein the 
second computer system com|»ises a UNIX-based con^uter 
workstation. 

10. The data processing system of daim 1, wherein the 
second conq)uter system comprises a personal computa. 

11. The data processing system of daim 10, wherein the 
second computer system operates under a WINDOWS™ 
operating system. 

12. An ad^ter for interfadng a SCSI bus with an IBM 
System/360/370 VO interface channd, the adapter conqiris- 
ing: 

a diannel interface unit operable to couple to an IBM 
Systemy360/370 VO interface channd, the channd 
interface unit operable to communicate SNA and non- 40 
SNA protocol information via the IBM System^60^ 
370 I/O interface channel; 

a SCSI interface unit operable to couple to a SCSI bus, tiie 
SCSI int^ace unit operable to communicate SCSI 
protocol information via the SCSI bus; and 43 

a processor coupled to the diannd interface unit and to 
the SCSI interface unit, the processor operable to 
control the channel interface unit and the SCSI inter- 
face unit to allow bidirectional communication 
between the SCSI bus and the IBM System/360/370 50 
I/O interface chaimeL 

13. The adapter of daim 12, wherein the processor is 
further operable to define a plurality of logical units repre- 
senting SCSI devices for communicating information via the 
SCSI bus. 55 

14. The adapter of claim 13, wherein the plurality of 
logical units represent tape devices. 

15. The ad^ter of daim 12, wherein the diannel interface 
unit, the SCSI interface unit and the processor are coupled 
to a processor bus. 


16. The adapter of claim 15, further conqirising a front 
panel coupled to the processor bus and operable to provide 
a user interface. 

17. The adapter of claim 15, further conqnising a dynamic 
RAM, a static RAM, and a programmable ROM, each 
coupled to the processor bus. 

18. The adapter of daim 12, wherein the SCSI interface 
unit comprises a dedicated SCSI controller. 

19. The adapter of daim 12, wherein the channd interface 
unit comprises a programmable logic device. 

20. The adapter of claim 12, wherein the processor 
coDD^ses a 32-bit microprocessor; 

21. An adapter for interfacing a SCSI bus with an IBM 
System/360/370 I/O interface channel, the adapter coirqiris- 
ing: 

a channd connector, the diannel connector operable to 
couple to an interface cbamiel port of a first con^uter 
system, the interface channd port providing access to 
an IBM System/360/370 I/O interface channd; 

a bus and tag interface coupled to the diannel connector; 

a channd controller coupled to the bus and tag interface, 
the channel controll^ operable to communicate SNA 
protocol information via the IBM System/360/370 3/0 
interface channel; 

a SCSI connector, the SCSI connector operable to couple 
to a SCSI device port of a second con^uter system, the 
SCSI device port providing access to a SCSI bus; 

a SCSI controller coupled to the SCSI connector, the SCSI 
controller operable to communicate SCSI protocol 
information via the SCSI bus; 

a processor bus coupled to the channel controller and to 
the SCSI controller; 

a user interface coupled to the processor bus; 

a dynamic RAM device coupled to the processor bus; 

a static RAM device coupled to the processor bus; 

a programmable ROM coupled to the processor bus; and 

a processor coupled to the processor bus, the processor 
operable to manage operation of the channd controller 
and the SCSI controller to allow bidirectional commu- 
nication between the SCSI bus and the IBM System/ 
360/370 1/0 interface channel 

22. The adapter of daim 21, wherein the SCSI controller 
comprises an NCR35CF96-2 SCSI controller. 

23. The adapter of claim 21, wherein the channel con- 
troller comprises a programmable logic device. 

24. The adapter of claim 21, wherein the processor 
comprises a MC68360 32-hit microprocessor. 

25. The ad^ter of claim 21, wherein the dynamic RAM 
device courses an eight megabyte DRAM. 

26. The adq>ter of claim 21, wherein the static RAM 
device co^^}rises a 512 kilobyte SRAM. 

27. Hie ad^ter of claim 21, wherein the programmable 
ROM device conqirises a 128 kilobyte EEPROM. 

28. The adapter of claim 21, wherein the user interface 
conqnises a front pand having LHD's, buttons and an LCD 
display. 
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METHOD AND SYSTEM FOR MANAGING 
DATA IN CACHE USING MULTIPLE DATA 
STRUCTURES 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to a method and system for 
caching data and, in particular, for using multiple data 
structures to manage data stored in cache. 

2. Description of the Related Art 

Data processing systems use a high-speed managed buffer 
memory, otherwise known as cache, to store frequently used 
data that is regularly maintained in a relatively slower 
memory device. For instance, a cache can be a RAM that 
buffers frequendy used data regularly stored in a hard disk 
drive or a direct access storage device (DASD). After a track 
is read from the DASD, the track will be cached in RAM and 
available for subsequent data access requests (DARs). In this 
way, a storage controller processing read requests can avoid 
the mechanical delays of having to physically access and 
read data from the DASD. Cache can also be a high speed 
memory to a microprocessor to store data and instructions 
used by the microprocessor that are regularly maintained in 
RAM. Processor cache would buffer data from a volatile 
memory device, such as a DRAM or RAM. 

Often data in cache is managed according to a least 
recently used (LRU) replacement algorithm in which the 
least recently used data is demoted from the cache to make 
room for new data. A first-in-first-out (FIFO) algorithm may 
also be used. The LRU replacement algorithm works by 
organizing the data in the cache in a linked list of data entries 
which is sorted according to the length of time since the 
most recent reference to each data entry. The most recently 
used (MRU) data is at one end of the linked list, while the 
least recently used (LRU) data is at the other. Data that is 
accessed firom the linked list or added for the first time is 
promoted to the MRU end. When data is demoted to 
accommodate the addition of new data, the demoted data is 
removed from the LRU end. 

Data can be accessed sequentially or non-sequentially. In 
the non -sequential access mode, data records are randomly 
requested. Such non-sequential accesses often occur when 
an application needs a particular record or data sets. Sequen- 
tial data access occurs when numerous adjacent tracks are 
accessed, such as for a data backup operation or to generate 
a large report. For instance, a disk backup usually creates 
one long sequential reference to the entire disk, thus, flood- 
ing the cache with data. One problem with LRU schemes is 
that if a sequential data access floods the cache when placed 
at the MRU end, then other non-sequential records are 
demoted and removed from cache to accommodate the large 
sequential data access. Once the non -sequential data is 
demoted from cache, a data access request (DAR) for the 
demoted data must be handled by physicaUy accessing the 
data from the slower memory device. 

One goal of cache management algorithms is to maintain 
reasonable "hit ratios" for a given cache size. A "hit" is a 
DAR that was returned from cache, whereas a "miss" occurs 
when the requested data is not in cache and must be retrieved 
from DASD. A"hit ratio" is empirically determined from the 
number of hits divided by the total number of DARs, both 
hits and misses. System performance is often determined by 
the hit ratio. A system with a low hit ratio may cause delays 
to application program processing while requested data is 
retrieved from DASD. 

A low hit ratio indicates that the data often was not in 
cache and had to be retrieved from DASD. Low hit ratios 
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may occur if non-sequentially accessed data is "pushed" out 
of the cache to make room for a long series of sequentially 
accessed data. The higher probability of subsequent DARs 
toward non-sequentially accessed data further lowers the hit 

5 ratio because non-sequentiaUy accessed data has a greater 
likelihood of being accessed. Moreover, the non- 
sequentially accessed data is **pushed out" of cache to make 
room for sequentially accessed data that has a lower likeli- 
hood of being accessed. 

10 In certain systems, sequential data is placed at the LRU 
end and non-sequential data at the MRU end. Such meth- 
odologies often have the effect of providing an unreasonably 
low hit ratio for sequentially accessed data because the 
sequentially accessed data has some probability of being 

15 accessed (although usually less than non-sequentially 
accessed data). Algorithms that place sequentially accessed 
data at the LRU end cause the sequential data to be demoted 
very quickly, thus providing a relatively low hit ratio. Still 
further, if there is a continued sequence of write operations, 

^0 i.e., modified data, read data could be pushed off the LRU 
list in cache, thus lowering the hit ratio for read accessed 
data. 

In current storage controller systems, a battery backed up 
RAM or non- volatile storage unit (NVS) may maintain a 

^ shadow copy of aU modified data in cache. Storage systems 
provided by International Business Machines Corporation 
("IBM") include two write operations, a cache fast write 
(CFW) and a DASD fast write (DFW). In a DASD fast write 
operation, data is written to both the cache and the NVS unit. 
The DASD fast write operation allows fast write hits by 
maintaining two copies of all data modifications, one in 
cache and another in NVS storage. The non-volatile storage 
protects against data loss by saving the data for up to 48 
hours (assuming a fuUy-chargcd battery) if power fails. 
When power is restored, then the data may be destaged from 
the NVS unit to DASD. DASD fast write applies to all write 
hits and to predictable writes. A write hit occurs when the 
requested data is in the cache. 

Cache fast write (CFW) improves write operation perfor- 
mance for data that the user does not need to store on DASD. 
Because the data does not have to be stored on the DASD, 
cache fast write eliminates DASD access time for write hits 
and predictable write operations as the write data need only 
be stored in cache. Further, cache fast write does not use 
non- volatile storage. However, cache fast write data may be 
written to DASD during the execution of cache management 
algorithms. Aspects of the DASD fast write and cache fast 
write operations are described in IBM pubUcation entitled 
"Storage Subsystem Library: IBM 3990 Storage Control 
Reference (Models 1, 2, and 3)", IBM document no. GA32- 
0099-06, (IBM Copyright 1988, 1994), which publication is 
incorporated herein by reference in its entirety. 
When the NVS unit reaches a predetermined threshold, 

55 data in the NVS unit must be destaged from the NVS. In 
current systems, the entire LRU linked list including all data 
entries must be scanned to locate the least recently used 
DASD fast write data to select as a candidate for destaging 
from the NVS unit, 

60 Moreover, with current systems, frequently updated data 
will not be destaged to disk because when the data is 
updated, a new time stamp is provided which places the 
modified data at the top of the LRU hst. Thus, frequently 
modified data may be more susceptible to loss as a result of 

65 system failures because such data tends not to be destaged 
and is instead systematically placed at the MRU end of the 
linked list. 
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SUMMARY OF THE PREFERRED 
EMBODIMENTS 

To overcome the limitations in the prior art described 
above, preferred embodiments disclose a cache management 
scheme. A first and second data structures indicate data 
entries in a cache. Each data structure has a most recently 
used (MRU) entry, a least recently used (LRU) entry, and a 
time value associated with each data entry indicating a time 
the data entry was indicated as added to the MRU entry of 
the data structure. A processing unit receives a new data 
entry. In response, the processing unit processes the first and 
second data structures to determine a LRU data entry in each 
data structure and selects from the determined LRU data 
entries the LRU data entry that is the least recently used. The 
processing unit then demotes the selected LRU data entry 
from the cache and data structure including the selected data 
entry. The processing unit adds the new data entry to the 
cache and indicates the new data entry as located at the 
MRU entry of one of the first and second data structures. 

In preferred embodiments, the data structures are linked 
lists, wherein the MRU entry is at one end of the linked list, 
and the LRU entry is at the other end. 

In farther embodiments, the first data structure is a list of 
data in cache sequentially accessed and the second data 
structure is a list of data in cache non-sequentially accessed. 
In yet further embodiments, the first data structure is a list 
of data written as part of a first type of write operation and 
the second data structure is a list of data written as part of 
a second type of write operation. Data written to the cache 
as part of the first type of write operation is also written to 
a storage unit. 

In still additional embodiments, a third data structure of 
data entries from the first and second data structures has data 
entries having a modified time value that indicates when the 
data entry was first modified in cache. The processing unit 
provides a base time value indicating a previous time value. 
For those data entries in the third data structure having a 
time value that is older than the base time value, the 
processing unit destages the copy of the modified data 
maintained in cache to the storage unit. 

Preferred embodiments provide a method and system for 
managing data in cache in different LRU linked lists. Using 
separate lists, the cache can be more efi&ciently managed to 
properly select data entries for demoting. When the linked 
lists distinguish between aon-sequentially and sequentially 
accessed data, sequentially accessed data may be readily 
accessed from the sequentially accessed linked list for 
demotion to prevent sequentially accessed data from domi- 
nating cache. Moreover, if separate lists are used to distin- 
guish between a cache fast write (CFW) and DASD fast 
write (DFW) where data is backed up in non-volatile 
storage, then when data is to be destaged from the non- 
volatile storage, a value to destagc can be readily located 
from the DFW list. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Referring now to the drawings in which like reference 
numbers represent corresponding parts throughout: 

FIG. 1 is a block diagram illustrating a software and 
hardware environment in which preferred embodiments of 
the present invention are implemented; 

FIGS. 2a, b illustrate linked hsts of cache entries in 
accordance with preferred embodiments of the present 
invention; 

FIG. 3 illustrates logic to process a data access request 
(DAR) and demote data from cache in accordance with 
preferred embodiments of the present invention; 
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FIGS. 4a, b illustrate logic to process a write request and 
destage data from a non-volatile storage (NVS) unit in 
accordance with preferred embodiments of the present 
invention; and 

FIG. 5 illustrates logic to destage data from the NVS unit 
in accordance with preferred embodiments of the present 
invention. 

DETAILED DESCRIPTION OF THE 
PREFERRED EMBODIMENTS 

In the following description, reference is made to the 
accompanying drawings which form a part hereof, and 
which is shown, by way of illustration, several embodiments 
of the present invention. It is understood that other embodi- 
ments may be utilized and structural changes may be made 
without departing from the scope of the present invention. 

Hardware and Software Environment 

FIG. 1 illustrates a hardware environment in which pre- 
ferred embodiments are implemented. A plurality of host 
systems Aa, b, c are in data communication with a DASD 6 
via a storage controller 8. The host systems Aa, b, c may be 
any host system known in the art, such as a mainframe 
computer, workstations, etc., including an operating system 
such as WINDOWS®, AIX®, UNIX®. MVS™, etc. AIX is 
a registered trademark of IBM; MVS is a trademark of IBM; 
WINDOWS is a registered trademark of Microsoft Corpo- 
ration; and UNIX is a registered trademark licensed by the 
X/Open Company LTD. A plurality of channel paths 10a, 6, 
c in the most systems 4a, b, c provide communication paths 
to the storage controller 8. The storage controller 8 and host 
systems 4aj b, c may communicate via any network or 
communication system known in the art, such as LAN, 
TCP/IP, ESCON®, SAN, SNA, Fibre Channel, SCSI, etc. 
ESCON is a registered trademark of International Business 
Machines Corporation ("IBM"). The host system 4a, b, c 
executes commands and receives returned data along a 
selected channel lOo, fe, c. The storage controller 8 issues 
commands to physically position the electromechanical 
devices to read the DASD 6. In preferred embodiments, the 
structure of the storage controller 8 and interface between 
the storage controUer 8 and host system may include aspects 
of the storage controller architecture described in the fol- 
lowing U.S. patent applications assigned to IBM: "Failover 
System for a Multiprocessor Storage Controller," by Brent 
C. Beardsley, Matthew J. Kalos, Ronald R. Knowlden, Ser. 
No. 09/026,622, filed on Feb. 20, 1998; and "Failover and 
Fallback System for a Direct Access Storage Device," by 
Brent C. Beardsley and Michael T Benhase, Ser. No. 
08/988,887, filed on Dec. 1,1, 1997, both of which appUca- 
tions are incorporated herein by reference in their entirety. 

The storage controller 8 further includes a cache 12. In 
alternative embodiments, the cache 12 may be implemented 
in alternative storage areas accessible to the storage con- 
troller 8. In preferred embodiments, the cache 12 is imple- 
mented in a high speed, volatile storage area within the 
storage controller 8, such as a DRAM, RAM, etc. The length 
of time since the last use of a record in cache is maintained 
to determine the frequency of use of cache. Data can be 
transferred between the channels 10a, b, c and the cache 12, 
between the channels 10a, fc, c and the DASD 6, and 
between the DASD 6 and the cache 12. In alternative 
embodiments with branching, data retrieved from the DASD 
6 in response to a read miss can be concurrently transferred 
to both the chatmel 10a, b, c and the cache 12 and a data 
write can be concurrendy transferred from the channel 10a, 
b, c to both a non-volatile storage unit and cache 12. 
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Also included in the storage controller 8 is a non-volatile 
storage (NVS) unit 14, which in preferred embodiments is a 
battery backed-up RAM, that stores a copy of modified data 
maintained in the cache 12. In this way if failure occurs and 
the modified data in cache 12 is lost, then the modified data 
may be recovered from the NVS unit 14, 

To determine whether a DAR is sequentially or non- 
seqixentially accessed, a command may be used to inform the 
storage controller 8 that a following DAR request is part of 
a series of sequentially accessed data. For instance, in the 
IBM mainframe environment, a Define Extent command 
indicates whether the following I/O operations are part of a 
sequential access. A description of the Define Extent com- 
mands is provided in the IBM publication, "IBM 3990/9390 
Storage Control Reference/* IBM Document no. GA32- 
0274-04 (Copyright IBM, 1994, 1996), which publication is 
incorporated herein by reference in its entirety. 
Alternatively, the storage controller 8 may utilize a predic- 
tive buffer memory management scheme to detect whether 
a sequence of data transfer operations are part of a sequential 
data access request. Such predictive buffer memory man- 
agement schemes are described in U.S. Pat. No. 5,623,608, 
entitled "Method and Apparatus for Adaptive Circular Pre- 
dictive Buffer Management/' assigned to IBM, and which 
patent is incorporated herein by reference in its entirely. 
These predictive memory buffer schemes may be used to 
detect sequential access for SCSI or mainframe type 
requests when the D ARs do not specifically indicate whether 
the DAR is part of an ongoing sequential access. 

Multiple Linked Lists in Cache 

FIGS. 2a, b illustrate preferred embodiments of doubly 
linked list data structures 2Xia, c, c\ d, d*, e maintained in 
cache 12. Each linked list 20a, b, c, c', d, d', e is comprised 
of a node including a list of pointers to data or track entries 
in the cache 12. Thus, a data entry in cache may be on 
multiple lists. In preferred embodiments, list 20a comprises 
pointers to all k sequentially accessed data entries in cache 
12; list 20b comprises pointers to all 1 nounsequentially 
accessed entries in cache 12; list 20c comprises all m 
sequentially accessed cache fast write (CFW) entries in 
cache 12; list 20c' comprises all m* non-sequentially 
accessed cache fast write (CFW) entries in cache 12; list 20d 
comprises all n sequentially accessed DASD fast write 
(DFW) entries in cache 12; list 20d' comprises all non- 
sequentially accessed n' DASD fast write (DFW) entries in 
cache 12; and list 20e in FIG. 2b comprises a set of modified 
entries in cache 12 and a sequence number indicating when 
the entry was first modified in cache 12. Thus, lists 20a and 
20b combined include all entries in cache and lists 20c, & 
and 204 d combined include all modified data entries in 
cache. 

Each list includes an anchor entry 22a, b, c, c\ 4 d\ e 
which includes a pointer to the top of the list or most recently 
used (MRU) end 24^^ b, c, c', d, d', e and a pointer to the 
bottom of the list or the least recently used (LRU) end 26a, 
b, c, c', d, d', e. As the list is doubly linked, each entry in the 
lists includes a pointer to the entry above, i.e., closer to the 
MRU end 24a, b, c, c', 4 d', e referred to herein as an "up 
pointer'* and a pointer to the entry below, i.e., closer to the 
LRU end 26flj b, c, c', d, d', e referred to herein as a "down 
pointer." 

Associated with each entry in the list is an MRU sequence 
counter indicating a time stamp when the data was added to 
the linked list 20fl, b, c, c', d, d', e. In preferred embodiments, 
when a data entry is added to the linked lists 20fl, b, c, c', d, 
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d', the previous MRU sequence counter is incremented to 
determine the current MRU sequence number. In this way, 
entries with a lower sequence counter number have 
remained in the list unaccessed for a longer period of time, 
i.e., less recently used. In preferred embodiments, after the 
storage controller 8 accesses data, the data is added back to 
the MRU end 2Aa, b, c, c', d, d\ However, list 20e does not 
alter the sequence number of data entries after the data is 
placed in cache 12 as list 20^ maintains the sequence number 
for when the entry was first modified in cache 12 and does 
not change as a result of subsequent modifications or 
accesses to the data entries. As entries are added at the MRU 
end 24fl, b, c, c' 4 d', e other entries move down the doubly 
linked list toward the LRU end 26a, b, c, d, 4 d', e. 

The lists 20a, b, c, c' d, d', e may be implemented as a 
doubly linked Ust of pointers to the data in cache 12 as 
shown in FIG. 2. Alternatively, the lists 20fl, b, c, c', 4 d', 
e may be implemented in control blocks allocated in cache 
12, wherein each track or data set has a corresponding 
control block in the control block section of cache 12. If a 
track or data set was in the list 20a, b, c, c', d, d', e, then the 
control block for such track would include fields indicating 
for the data entry in cache 12 the "up pointer", "down 
pointer," MRU sequence number for each list 20fl, b, c, c*, 
d, d*, e including the data entry. Another control block could 
maintain the anchor 22fl, b, c, c', d, d', e information to 
access the beginning and end of the list. Embodiments 
utilizing the control blocks may be used to avoid dynamic 
memory allocation. In still further embodiments, the list 
20fl, b, c, c', 4 d', e may be comprised of the actual data 
instead of just pointers to such data. 

By providing separate lists, the storage controller 8 may 
more effectively manage cache 12 and determine which 
entries to demote or deslage from the NVS unit 14. For 
instance, to prevent sequentially accessed data from domi- 
nating cache 12, the storage controller 8 can use the sequen- 
tially accessed list 20a to select data to demote from cache 
12. Moreover, the storage controller 8 may modify the 
sequence number of sequentially accessed data to provide 
sequentially accessed data with a relatively lower sequence 
number than the sequence number provided non- 
sequentially accessed data. In this way, the storage controller 
8 would be more likely to demote sequentially accessed data 
over non-sequentially accessed data. For instance, the stor- 
age controller 8 may accelerate 01 modify sequence numbers 
added to the sequential CFW 20c and DFW 20d lists to 
provide sequential modified data with a relatively lower 
sequence number so the sequential modified data is demoted 
before non -sequential modified data. 

The storage controller 8 could also use the CFW 20c, d 
and DFW 204 d^ lists comprising modified data to prevent 
modified data entries from dominating cache by selecting 
modified entries from the CFW 20c, c' and DFW 204 d' lists 
to demote after modified data, i.e., the total of entries in the 
CFW 20c, c' and DFW 204 d lists, reaches a predetermined 
threshold level of cache 12, 

Data in cache 12 may be on more than one list. For 
instance, a data track can be on one of the sequential list 20a 
or non-sequential list 20fc, but not both, and on one of the 
CFW list 20c, c' or DFW list 204 but not more than one 
of the lists, and on the modified list 20e indicating when the 
data was first modified in cache 12. Moreover, the sequence 
number for a data entry maintained on two lists may be 
different. For instance, a data entry on the DFW lists 204 d 
has a sequence number indicating when the entry was last 
DASD fast write modified. If this entry is subsequently 
accessed sequentially as part of a read request, then this 
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entry would have a more current sequence number on the MRU end 24a of the sequential list 20fl and provide the data 

sequential list 20a than the DFW lists 204 d\ which remains entry the current MRU sequence number. The current MRU 

unchanged as a result of the access. The sequential 20a and sequence number may be calculated by incrementing the 

non-sequential 20Z> lists are updated on either a read or write previous sequence number. If the data entry was on the 
data access request. Moreover, the same data entry may have 5 non-sequential list 206, then the storage controller 8 would 

an even earHer sequence number on the list 20e indicating remove the data from the non-sequential list. Otherwise, if 

when the data entry was first modified in cache 12. DAR read i^equest is non-sequential, then control trans- 

^, .„ ^ , , r .i. fers to block 42 which represents the storage controller 

HG. 2b lUustrates a preferred embodiment of the time ^^^^^^ accessed data at the MRU end 24b of the 

sequence destage list 20e which is used to determme when non-sequential list 20b and, if necessary, removing the data 

to destage entries from cache 12 that were modified. At lO ^^^^ sequential list 20fl. 

predetermined time iritervals, the storage a)ntroller 8 would ^^^^ ^ ^ ^^^^^ ^ ^^^^ ^^^^^^ 

access the hst 20^ and destage all entries havmg a sequence ^^^^^^^^ ^^^^^^^ 3^ ^j^.^^ represents the storage controUer 

number less than a certain previous Ume value In this way, g r^qutst^d data from DASD 6. Control then 

the storage controller 8 can insure that data will not remain ^^^^^^^ ^^^^^ ^^^^ represents the storage controUer 

unmodified and not destaged to DASD 6 for an expended 15 ^ ^^^^ ^j^^^^^ ^^^^.^^ ^^^^^ ^^^^ 

period of tune^ Once a daU entry is destagedh^^ ^ ^^^^ ^2 capacity threshold. If no, control transfers 

sequence list 20e, the DFW list 204 ft or CFW list 20c C, ^^^^^ ^ ^ ^ ^^^^^ ^^^^ 

the data entry is removed from the tmie sequence list 20.. ^^^^^^^^^ ^^^^^ ^^^^ ^j^^ 24a, b 

To add a data entry to the MRU end of a hst 20a, b, c, ^^^^ appropriate list 20a, b. K the cache 12 capacity 

4 d', e, the storage controller 8 would modify the MRU threshold would be exceeded, then the storage controller 8 

pointer to point to the added data entry, adjust the up pointer ^^^5^ demote a data entry from the LRU end 26a, b of one 

of the previous MRU entry to address the new (added data (jj^ sequential or non-sequential linked lists 20a, b in 

entry) MRU data entry and adjust die down pointer of the ^^c\i<^ 12. 

new MRU entry to address the previous MRU data entry. A Control transfers to block 46 which represents the storage 
sequence number is maintained to mdicate when a data entry controller determining whether the requested data is from a 
was placed at the MRU end 24a, b, c, c\ 4 d\ e of the list sequential DAR. If so, control transfers to block 48; 
20a, 6, c, c; 4 d', e. When placing a data entry at the MRU otherwise, control transfers to block 50. Block 48 represents 
end 24a, b, c, c', 4 d', e the storage controller 8 would the storage controller 8 determining whether the addition of 
increment the sequence number. To place a data entry at the ^^^^ ^ ^^^^ amount of sequentiaUy accessed data 
MRU end 24a, b, c, c\ d, d that is already in the list, the ^ ^^^j^^ ^2 to exceed some predetermined threshold. The 
storage controller 8 would have to first remove the data entry ^xom^c controller 8 may make such determination by con- 
from the Ust before placing the data entry at the MRU end. sidering the total number of sequentially accessed data 
To remove a data entry from a list, the up and down pomters ^^^^^^ ^ ^j^^ sequential Ust 20a. If the addition of the new 
of the removed data entry are set to null and the up and do^yn sequenUally accessed data entry will cause the limit to be 
pointers of the entries in the hst 20a, b, c, c', 4 d', e that exceeded, then- control transfers to block 52; otherwise, 
addressed the removed data entry arc modified to address ^^^^^^^ transfers to block 50. Block 52 represents the storage 
each other If a data entry was removed fi-om the LRU 26a, controller 8 demoting data at the LRU end 26a of the 
b, c, C, 4 d', e or the MRU 24a, b, c, c', 4 d', e end, then sequential list 20a, overlaying the new data onto the location 
the MRU 24a, b, c, c', 4 d', e and LRU pointers 26a, fo, c, demoted data, and returning the requested data, 
c', 4 d\ e would have to be modified. Control then transfers to block 54 where the storage con- 
To demote a data entry from the LRU end 26a, b, c, c', 4 troUer 8 places the accessed data at the MRU end 24a of the 
d', e of the list 20a, b, c, c', 4 d', e, the storage controller 8 sequential list 20a with the new MRU sequence number, 
would modify the LRU pointer 26a, b, c, c\ 4 d\ e in the [f sequential data does not have to be demoted, then the 
anchor entry 22a, b, c, c', 4 e to address the data entry storage controller 8 must determine the sequentially or 
addressed by the up pointer of the entry to be demoted, set non-sequentially accessed data entry that has been in cache 
the down pointer of the data entry addressed by the new 12 the longest without being accessed, i.e., the LRU entry 
LRU data entry to null, and modify the up pointer of the data 26a, b, across both lists 20a, b. At block 50, the storage 
entry to demote to null. controller 8 determines whether either of the lowest 

50 sequence numbers in the lists 20a, b have wrapped. For 

Cache Management instance, if the maximum sequence counter is 100, then any 

FIG. 3 illustrates logic executed by the storage controller sequence number less than 100 is either that number under 

8 to process a read request. Because a read request does not 100 or greater than 100 if the counter "wrapped.'* For 

concern modified data, only the sequential 20a and non- instance a sequence number of 4 can be either 4, 104, 204, 
sequential 20b lists are used. Control begins at block 30 55 etc. To determine whether a wrap occurred, in preferred 

which represents the storage controller 8 processing a read embodiments, the storage controller 8 may determine 

DAR. Control transfers to block 32 which represents the whether the difference of the LRU 26a, b, i.e., lowest 

storage controller 8 determining whether the requested data sequence numbers from both lists 20a, b, is greater than half 

is in cache 12. If so, control transfers to block 34; otherwise the maximum sequence count. For instance, if the maximum 
control transfers to block 36. At block 34, the storage 60 sequence count is 100 and the LRU (e.g., lowest sequence 

controller 8 accesses the requested data from cache 12 and numbers) on the lists 20a, b are 3 and 94, then the difference, 

returns the accessed data to the requestor, i.e., application or 91, is greater than half the maximum sequence count, 50. In 

device initiating the read DAR. Control then transfers to such case, the storage controUer 8 selects the greatest 

block 38 which represents the storage controller 8 determin- sequence number, 94, as it indicates an older sequence 
ing whether the accessed read DAR is a sequential access 65 number than the wrapped sequence number 3, 

request. If so, control transfers to block 40 which represents If one of the sequence numbers wrapped, then control 

the storage controller 8 placing the accessed data at the transfers to block 56, which represents the storage controller 
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8 selecting the LRU data entry 26a, b from the lists 20aj b 
having the larger sequence number as the candidate for 
demotion instead of the LRU data entry 26a, b having the 
lowest sequence number. Otherwise, control transfers to 
block 58 which represents the storage controller 8 selecting 
the LRU data entry 26a, b from the list 20a, b having the 
lowest sequence number, in which case there was not a wrap. 
From block 56 or 58, control transfers to block 60 which 
represents the storage controller 8 demoting the selected 
data entry by invalidating the selected data entry in cache 12 
and overlaying the invalidated data entry with the new data 
entry. Control then transfers to block 34 to return the 
accessed data and place the accessed data at the MRU end 
24a, b of the appropriate list 20a, b. 

FIGS. 4a, b illustrate logic executed by the storage 
controller 8 to process a write operation to update data in the 
DASD 6. Control begins at block 70 which represents the 
storage controller 8 processing a write operation including 
modified data to update in the DASD 6. Control transfers to 
block 72 which is a decision block representing the storage 
controller 8 deterihining whether the data entry to update is 
already in cache 12. If so, control transfers to block 74; 
otherwise, control transfers to block 102 in FIG. 4b. If the 
data entry to update is in cache 12, then at block 74 the 
storage controller 8 updates the data entry in cache 12 and 
proceeds in parallel to execute blocks 78, 80, and 82 to place 
the updated data entries on the appropriate lists 20a, b, c, c'd', 
d'. At block 78, the storage controller 8 determines whether 
the write is a cache fast write (CFW). If the write is a CFW, 
then control transfers to block 84 which represents the 
storage controller 8 placing the updated data entry at the 
MRU end 24c, c' of the appropriate CFW list 20c, c' with the 
new MRU sequence number. Sequentially and non- 
sequentially written CFW data is placed on lists 20c, c\ 
respectively. If the data entry was previously on one of the 
DASD fast write (DFW) lists 20*^, rf, then the storage 
controller 8 would remove the updated data entry from the 
DFW list 20d, <f . If the write is a DFW, then control transfers 
to block 86 which represents the storage controller 8 placing 
the updated data entry at the MRU end 24^, d of the 
appropriate DFW list 20d, d with the new sequence number. 
Sequentially and noo-sequentially written DFW data is 
placed on lists 20^ , respectively. If the data entry was 
previously on one of the CFW lists 20c, c\ then the storage 
controller 8 would remove the updated data entry from the 
CFW list 20c, c'. 

At block 80, the storage controller 8 determines whether 
the DAR write was a part of a series of sequential write 
operations. If so, control transfers to block 88 which repre- 
sents the storage controller 8 placing the updated data entry 
at the MRU end 24a of the sequential list 20a with the new 
sequence number. If the data entry was previously on the 
non-sequential list 206, then the storage controller 8 would 
remove the updated data entry from the non-sequential list 
206. If the write operation was a non-sequential operation, 
then control transfers to block 90 which represents the 
storage controller 8 placing the updated data entry at the 
MRU end 246 of the non-sequential list 206 with the new 
sequence number. If the data entry was previously on the 
sequential list 20a, then the storage controller 8 would 
remove the updated data entry from the sequential list 20a. 

At block 82, the storage controller determines whether the 
write is a DASD fast write (DFW) operation. If so, control 
transfers to block 92; otherwise, control transfers to block 94 
to end the program as the operation is a CFW and data does 
not have be placed in the NVS unit 14. If the write is a DFW, 
then at block 92 the storage controller determines whether 
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the data entry to update is already in NfVS 14. If so, control 
transfers to block 96 which represents the storage controller 
8 updating the data entry in NVS 14 with the write update. 
Otherwise, if the data entry is not already in NVS 14, then 

5 control transfers to block 98 which represents the storage 
controller 8 determining whether the addition of data to the 
NVS 14 will exceed a predetermined threshold. If so, control 
transfers to block 100 to demote data at the LRU end 26^, 
d of the DFW lists 20df, d and destage the demoted data 
from the NVS 14 to DASD 16. In preferred embodiments, 
the storage controller 8 may select the NVS 14 entry to 
demote that is the oldest entry on DFW lists 204, To 
determine the oldest entry from the lists 204 to demote, 
the storage controller 8 may use the logic at blocks 50 et seq. 

j5 in FIG, 3 to determine whether the sequence numbers 
"wrapped" when selecting the oldest entry from the lists 
20d, d. The storage controller 8 would then overlay the data 
to update on the data entry destaged in NVS 14. In this way, 
the demoted entry is destaged from NVS 14 and is no longer 
modified data. In such case, the demoted data may be 
maintained in cache 12 and remain on the sequenrial 20a or 
non-sequential 206 lists. If the addition of the data update to 
NVS 14 will not exceed a limit, then control transfers to 
block 96 to add the update to NVS 14. 

25 If the data in DASD 6 to be updated is not in cache 12, 
then control transfers to block 102 in FIG. 46 which repre- 
sents the storage controller 8 determining whether the addi- 
tion of the modified data to the cache 12 would exceed a 
predetermined cache 12 capacity threshold. If so, control 

30 transfers to block 104; otherwise, control transfers to block 
106 which represents the storage controller 8 adding the 
modified data to cache and then proceeding to block 108 
where the storage controller executes blocks 78, 80, and 82 
in parallel to modify the appropriate lists 20a, 6, c, c', d, d 

35 to reflect the data added to cache, and to the NVS 14 if the 
data is a DFW. If data needs to be demoted from cache 12 
to make room for the new write data, then at block 104 the 
storage controller 8 determines whether the addition of the 
modified data to cache 12 will cause a sequential and 

40 modified data thresholds to be exceeded. 

The sequential threshold is exceeded when the number of 
sequentially accessed data entries in cache exceed a prede- 
termined threshold. The current number of sequential entries 
in cache can be determined by examining the sequential list 

45 20a. The modified data threshold is exceeded when the 
number of modified data entries exceeds a predetermined 
threshold. The current number of modified data entries in 
cache 12 can be determined by examining the total number 
of entries in the CFW 20c, c' and DFW 20^ d lists. If the 

50 addition of the modified data to cache will not exceed both 
the sequential and modified thresholds, then control trans- 
fers to block 110; otherwise control transfers to block 112. 
Block 110 represents the storage controller 8 demoting the 
oldest data entry at the LRU ends 26a, 6 of the sequential 

55 20a and non-sequential 206 lists. The storage controller 8 
may apply the logic of blocks 50 et seq. in FIG. 3 to select 
the oldest entry in the event there is a wrap. After demoting 
the oldest entry in cache 12, then control transfers to block 
108 to update the lists 20a, 6, c, c', d, d by concurrently 

60 executing blocks 78, 80, and 82. 

If the addition of data to cache 12 exceeds either sequen- 
tial or modified thresholds or both, then at block 112, the 
storage controller 8 determines whether the addition of the 
modified data entry to cache 12 will not exceed tlie sequen- 

65 tial threshold, but exceed the modified threshold. If so, 
control transfers to block 114; otherwise, control transfers to 
block 116. At block 114, the storage controller 8 demotes the 
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oldest entry from the CFW 20c, c' and DFW 20d, d lists. embodiments, the storage controller 8 may destage all data 

This ensures that a modified data entry is demoted so the entries from the lime sequence list 20e having a sequence 

addition of the new modified entry does not cause the number that is older than the base sequence number, 

modified data threshold to be exceeded or further exceeded. When modified data is added to the cache 12 or when a 

The storage controller 8 may apply the logic at blocks 50 et 5 current data entry is updated for the first time, then the 

scq. in FIG. 3. To select the two LRU entries from the lists storage controller 8 would add the new modified data entry 

20c, c; d, d to compare according to the logic at blocks 50 to the time sequence list 20e and indicate the sequence 

et seq., the storage controller 8 may select the oldest (lowest number when the modified data was first added. This step of 

sequence number) and most recent (highest sequence adding a new modified entry to the time sequence list 20e 

number) of the LRU entries 26c, c', d, d io compare. From lO may occur concurrently with the other parallel operations 

block 114, control transfers to block 108 to update the 78, 80, and 82 in FIG. 4a at the first instance the data is 

appropriate lists to reflect the new modified data added to the modified in cache. Subsequent modifications of the data 

cache 12. At block 116, the storage controller determines entry will not change the entries in the time sequence list 

whether the addition of the new modified data to cache 12 20e 

will cause the sequential threshold to be exceeded, but not is FIG. 5 illustrates logic executed by the storage controller 

the modified threshold to be exceeded. If so, control trans- 8 to periodically destage data from the cache 12 and/or NVS 

fers to block 118 which represents the storage controller 8 14 using the time sequence list 20e. When the storage 

demoting the LRU entry 26a from the sequential 20a list to controller 8 initiates background destaging, the storage 

not increase the current number of sequentially accessed controller 8 sets a base sequence number to the current MRU 

entries in cache 12. 20 sequence number. After a predetermined period interval, the 

Block 120 represents the storage controller determining storage controller 8 again initiates the background destaging 

whethertheadditionof the modified data to cache will cause routine using the base sequence number set during the 

both the sequential and modified thresholds to be exceeded. execution of the previous background destaging operation to 

If so, control transfers to block 122 to demote the oldest determine if any data entries in the time sequence list 20e 

LRU entry 26c, d from the CFW 20c and DFW 20d 25 have sequence numbers (indicating the time the data was 

sequential lists using the logic at block 50 et seq. in FIG. 3. first modified) less than the base sequence number. Such a 

The operation at block 122 insures that modified sequential situation may indicate that although the data entry was 

data is demoted from cache 12, The storage controller 8 placed on the list a while ago, the data entry has, 

would also modify one or more of the lists 20a, c, c', d, nonetheless, not been destaged. The logic of FIG. 5 ensures 

d to reflect the removal of the demoted data entry. In this 30 that frequendy updated data, which has its MRU sequence 

way the modified 20a, £>, c, c', d, d lists are used to ensure number frequenUy reset to the current time value, gets 

that data is demoted in a manner that will not exceed certain destaged to DASD 6. When a time sequence number is less 

thresholds of data types in the cache 12, such as sequentially than the base sequence number, then the modified data was 

accessed data and modified data. likely updated and, in such casc,.may not be demoted using 

In further embodiments, the storage controUer 8 may the other lists 20a, fc, c, c', d, ^^ With the lo^c of FIGS. 3 

adjust the sequence number of sequentially accessed data ^^d 4«, fc, such frequcnUy updated (modified) data would 

downward before placing sequentially accessed data on the ^ot otherwise be destaged because the sequence number of 

sequential lists 20a, c, d. thereby accelerating the advance- ^''^^f ^.^^a in the lists 20a. b e c d d is r^et each 

ment of sequenUally accessed data as a candidate for t^^e the data is accessed and placed back at the MRU end 

removal from the cache 12. An example of an accelerated 24a Z>, c, C 4 of the hsts 20a. c. rf, d according to 

sequence number for sequentially accessed data could be , ^ . t.i i 

calculated from equation (1) as foUows: ^it^ '^^^^^^^^ ^> ^^^^^^ ^^.^1°^. 

which represents the storage controller 8 penodically initi- 

acceierated sequence number-MRU sequence count-vi*(LRU list ^jj^g 3 background destaging Operation at predetermined 

45 time intervals. Control transfers to block 132 which repre- 


member count) (1) 


Thus, the accelerated sequence number reduces the cur- sents the storage controller 8 accessing the data entry at the 

rent MRU sequence number by one-half the number of data LRU end 26c of the time sequence list 20e. Control then 

entries in the list to which the data entry will be added. transfers to block 134 which represents the storage control- 

Those skilled in the art will appreciate that alternative ler 8 determining whether Uie sequence number of the 

equations and methods could be used to adjust the sequence 50 accessed data entry in the Hst 20^ is less than the base 

number for sequentially accessed data to make such data a sequence number, i.e., whether the data entry sequence 

more likely candidate for demotion than non-sequentially number is older than the base sequence number. If so, 

accessed data. control transfers to block 136 which represents the storage 

The time sequence list 20c in FIG. 2b includes data controller 8 destaging the accessed data entry. To destage the 

entries, i.e., pointers to data in cache 12, and a sequence 55 modified data, the copy of the modified data in the cache 12 

number indicating when the data entry was first modified in or NVS unit 14 is destaged to the DASD 6. Control transfers 

cache 12. Thus, data may have been added to cache as part to block 140 which represents the storage controller 8 

of a read request, but will not be added to the time sequence removing the destaged data from the CFW 20cj c' or DFW 

list 20c until such data has been modified in cache 12. The 20d, d lists and the time sequence list 20c. However, the 

time sequence list may be a user specified subset of tracks 60 destaged data may still remain in cache 12 and on the 

in cache 12 or aU modified data in cache 12 as indicated in sequential 20a and non-sequential 20b lists as unmodified 

the CFW 20c, c' and DFW 204 d lists. If the data entry is data. Control then transfers to block 144 which represents 

subsequently modified again, then the sequence number on the storage controUer 8 accessing the next data entry in the 

the time sequence list 20e remains unchanged even though list closer to the MRU end 24c addressed by the up pointer 

it may change on the CFW 20c, c' or DFW 20d, d list. The 65 of the current, just considered, data entry. From block 144 

storage controller 8 further maintains a base sequence num- control proceeds back to block 134 to process further entries 

ber indicating a previous time stamp. In preferred on the list 20c. 
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If the data entry sequence number is not less than the base 
sequence number, then control transfers to block 138 which 
represents the storage controller 8 setting the base sequence 
number to the current MRU sequence number and then back 
to block 130 to initiate the background destaging operation 
after a predetermined interval. If the accessed data entry 
sequence number is greater than the base sequence number, 
then the data entry is more recent than the base sequence 
number. As the list is scanned from the LRU end 26e to the 
MRU end 24e, no further data entries in the time sequence 
list 20e need be destaged. Once the considered data entry in 
list 20e has a more recent time value than the base sequence 
number, no further entries closer to the MRU end 24e would 
have a time value later than the base sequence number. 

Alternative Embodiments and Conclusion 

The prefened embodiments may be implemented as a 
method, apparatus or article of manufacture using standard 
programming and/or engineering techniques to produce 
software, firmware, hardware, or any combination thereof. 
The term "article of manufacture" (or alternatively, "com- 
puter program product") as used herein is intended to 
encompass one or more computer programs and data files 
accessible from one or more computer-readable devices, 
carriers, or media, such as a magnetic storage media, "floppy 
disk," CD-ROM, a file server providing access to the 
programs via a network transmission line, holographic unit, 
etc. Of course, those skilled in the art will recognize that 
many modifications may be made to this configuration 
without departing from the scope of the present invention. 

Preferred embodiments were described with respect to the 
IBM mainframe environment, where a storage controller 
unit interfaces numerous host systems with a DASD. 
However, those skilled in the art will appreciate that the 
preferred caching algorithms could apply to any data trans- 
fer interface known in the art, including SCSI, ST-506/ST- 
412, IDE/ATA, Enhanced Small Device Interface (ESDO, 
floppy disk, paraUel port, ATA, EIDE, ATA-2, Fast ATA, 
Ultra ATA, etc., where data is cached. 

Preferred embodiments were described with respect to a 
sequence number used to determine the length of time data 
has remained unaccessed in the cache. In alternative 
embodiments, different methods may be used to maintain 
time information for data entries in the cache, such as a time 
stamp. 

The logic of FIGS. 3-5 may be implemented in microcode 
accessible to the storage controller 8 or as part of an 
application the storage controller 8 executes. Still further, 
the logic of FIGS. 3-5 may be executed in hardwired 
circuitry dedicated to managing the cache 12. Alternatively, 
certain of the logic of FIGS. 3-5 may be performed by the 
host system 4a, k c. The logic of FIGS. 3-5 is for illustrative 
purposes. Certain steps may be modified or removed alto- 
gether and other steps added. Further, the order of the steps 
performed may also vary from the described embodiments. 

In preferred embodiments, the data maintained in cache 
rhay be any data set or format, including fixed block CKD 
track, record, stripe, etc. Moreover, preferred embodiments 
were described as removing and adding data to cache 12 io 
response to DARs. In alternative embodiments, the logic 
managing the cache may add and remove data as part of data 
management operations unrelated to specific DARs. 

Preferred embodiments were described using seven lists 
20af b, c, c', dj d', e together to manage cache. However, in 
alternative embodiments, the lists may be used separately 
and independently of each other. For instance, only the 
sequential 20fl and non-sequential 20b lists may be used to 
prevent sequentially accessed data from dominating cache; 
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only the CFW 20c, c' or DFW 204 ^ lists may be used to 
destage data; and only the time sequence list 20^ may be 
used to insure that modified data is destaged as of a 
particular time. In yet further embodiments, there may be 
5 different lists, such as a single DFW or CFW list of all DFW 
or CFW entries, both sequential and non-sequential. 

In preferred embodiments, the lists were described as 
implemented as doubly linked list data structures comprised 
of Hsts or control blocks. Those skilled in the art will 
^ appreciate that alternative data structures may be utilized to 
implement the lists. 

Preferred embodiments were described with respect to 
managing a cache that buffers data from a DASD. The logic 
of the preferred embodiments could be used to manage 
cache that buffers data from any type of memory device, 
15 non-volatile as well as volatile, to another cache memory, 
which may be of a higher speed providing faster access. For 
instance, data from a DRAM or RAM can be buffered in a 
higher speed cache, such as a cache that is on-board a 
microprocessor, e.g., the 12 cache used with the PEN- 
20 TIUM® n microprocessor. PENTIUM II is a registered 
trademark of Intel Corporation. 

In summary, preferred embodiments disclose a cache 
management scheme using multiple data structures. A first 
and second data structures indicate data entries in a cache. 
25 Each data structure has a most recently used (MRU) entry, 
a least recently used (LRU) entry, and a time value associ- 
ated with each data entry indicating a time the data entry was 
indicated as added to the MRU entry of the data structure. 
A processing unit receives a new data entry. In response, the 
processing imit processes the first and second data structures 
to determine a LRU data entry in each data structure and 
selects from the determined LRU data entries the LRU data 
entry that is the least recently used. The processing unit then 
demotes the selected LRU data entry from the cache and 
data stmcture including the selected data entry. The process- 
ing unit adds the new data entry to the cache and indicates 
the new data entry as located at the MRU end of one of the 
first and second data structures. 

The foregoing description of the preferred embodiments 
of the invention has been presented for the purposes of 
illustration and description. It is not intended to be exhaus- 
tive or to hmit the invention to the precise form disclosed. 
Many modifications and variations are possible in light of 
the above teaching. It is intended that the scope of the 
invention be limited not by this detailed description, but 
rather by the claims appended hereto. The above 
specification, examples and data provide a complete descrip- 
tion of the manufacture and use of the composition of the 
invention. Since many embodiments of the invention can be 
made without departing from the spirit and scope of the 
invention, the invention resides in the claims hereinafter 
appended. 

What is claimed is: 

1. A method for caching data, comprising the steps of: 
providing a first and second data stmctures indicating data 
entries in a cache, wherein each data structure has a 
most recently used (MRU) entry, a least recently used 
(LRU) entry, and a time value associated with each data 
entry indicating a time the data entry was indicated as 
added to the MRU entry of the data structure; 
receiving a new data entry; 

processing the first and second data structures to deter- 
mine a LRU data entry in each data structure; 

selecting from the determined LRU data entries the LRU 
65 data entry that is the least recently used; 

demoting the selected LRU data entry from the cache and 
data structure including the selected data entry; 
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adding the new data entry to the cache; and 
radicating the new data entry as located at the MRU entry 
of one of the first and second data structures. 

2. The method of daim 1, wherein the first data structure 
indicates data entries in the cache sequentially accessed and 5 
the second data structure indicates data entries in the cache 
non-sequentially accessed, wherein the step of selecting the 
LRU data entries comprises the steps of: 

determining whether adding the new data entry to cache 
would cause the number of sequentially accessed data 
entries to exceed a threshold, wherein the step of 
selecting from the determined LRU data entries the 
LRU data entry that is least recently used occurs after 
determining that adding the new data entry to cache 
would not cause the sequentially accessed data entries 
to exceed the threshold; and 

selecting the LRU data entry from the first data structure 
to demote after determining that adding the new data 
entry to cache would cause the sequentially accessed 
data entries to exceed the threshold. 

3. The method of claim 2, further comprising the steps of: 
receiving a data access request for requested data in the 

cache; 

returning the requested data from the cache; 25 
determining whether the data access request is a sequen- 
tial access; 

indicating that the requested data is located at the MRU 
entry of the first data structure after determining that 
the data access request is a sequential access; and 30 

indicating that the requested data is located at the MRU 
entry of the second data stmcture after determining that 
the data access request is a non-sequential access. 

4. The method of claim 1, wherein the first data structure 
indicates data entries written to the cache as part of a first 
type of write operation and the second data structure indi- 
cates data entries written to the cache as part of a second type 
of write operation, wherein data written to the cache as part 
of the first type of write operation is also written to a first 
storage unit, and wherein the new data entry comprises data *o 
to write to a second storage unit, further comprising the steps 
of: 

determining whether the new data entry is of the first write 
operation type; 

selecting the LRU data entry from the first data structure '^^ 
to destage from the first storage unit to the second 
storage imit after determining that the new data entry is 
of the first write operation type; 

indicating the selected LRU data entry as removed from 
the first data structure; and 

adding the new data entry to the first storage unit, wherein 
the step of indicating the new data entry as located at 
the MRU entry of one of the first and second data 
stmctures comprises indicating the new data entry as 
located at the MRU entry of the first data structure. 

5. The method of claim 4, further comprising the steps of: 
receiving modified data for a data entry that is aheady in 

cache; 

determining whether the received modified data is of the 
first type of write operation; 

updating the data entry in cache with the received modi- 
fied data; 

updating the data entry in the first storage unit with the 
received modified data after determining that the 65 
received modified data is of the first type of write 
operation; 
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indicating that the updated data entry is at the MRU entry 
of the first data structure after determining that the 
modified data is of the first type of write operation; and 

indicating that the updated data entry is at the MRU entry 
of the second data structure after determining that the 
data access request is of the second type of write 
operation. 

6. The method of daim 4, further comprising the steps of: 
providing a third data structure indicating data entries in 

cache sequentially accessed and a fourth data structure 
indicating data entries in cache non-sequentially 
accessed, wherein the third and fourth data structures 
have an MRU entry and an LRU entry; 
determining whether the new data entry is sequentially 
accessed data; 

indicating that the new data entry is at the MRU entry of 

the third data structure after determining that the new 

data entry is sequentially accessed; and 
indicating that the new data entry is at the MRU entry of 

the fourth data structure after determining that the new 

data entry is non-sequentially accessed. 

7. The method of claim 1, further comprising a third and 
fourth data structures, wherein the first data structure indi- 
cates modified data entries written in cache and a storage 
unit as part of a sequential write operation, wherein the 
second data structure indicates modified data entries written 
in cache and the storage unit as part of a non-sequential write 
operation, wherein the third data structure indicates modi- 
fied data entries written in cache and not the storage unit as 
part of a sequential write operation, and wherein the fourth 
data structure indicates modified data entries written in 
cache and not in the storage unit as part of a non-sequential 
write operation, wherein the step of processing the first and 
second data structures further comprises processing the third 
and fourth data structures to determine an LRU data entry 
from the first, second, third, and fourth data structures. 

8. The method of claim 7, further comprising the step of 
providing a modified data threshold indicating a maximum 
number of modified data entries to maintain in cache, 
wherein the step of processing the four data structures 
occurs after determining that the addition of modified data to 
the cache will cause the modified data threshold to be 
exceeded. 

9. The method of claim 8, further comprising a fifth data 
structure indicating the number of entries in the cache 
sequentially accessed and a sixth data structure indicating 
the number of entries in cache non-sequentially accessed, 
further comprising the steps of: 

providing a sequentially accessed threshold indicating a 
maximum number of sequentially accessed data entries 
to maintain in cache; 

determining whether the addition of a data entry to the 
cache will cause the number of data entries in cache to 
exceed the modified and sequentially accessed 
thresholds, wherein the step of processing the first, 
second, thu-d, and fourth data structures occurs after 
determining that the addition of modified data to the 
cache will exceed the modified data threshold and not 
exceed the sequentially accessed threshold; 

processing the fifth and sixth data structures to determine 
LRU data entries after determining that the modified 
threshold is not exceeded, wherein the steps of select- 
ing and demoting comprises selecting and demoting the 
LRU data entry that is the least recently used entry from 
the fifth and sixth data structures; and 

processing the first and third data structures to determine 
LRU data entries after determining that the modified 
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aad sequentially accessed thresholds are both 
exceeded, wherein the steps of selecting and demoting 
comprises selecting and demoting the LRU data entry 
that is the least recently used entry from the first and 
third data structures. 

10. The method of claim 1, wherein the time value cannot 
exceed a maximum time value, wherein after reaching the 
maximum time value the time value resets to zero, further 
comprising the step of determining whether the time value 
for at least one of the LRU data entries was reset to zero, 
wherein the step of selecting the LRU data entry comprises 
selecting the LRU data entry that has a time value that was 
previously reset. 

11. The method of claim 10, wherein the time value 
comprises a sequence number that is incremented when data 
entries are indicated as added to the MRU entry of the data 
structures, wherein the maximum time value comprises a 
maximum sequence number, and wherein the URV data 
entry in a data structure has the lowest sequence number in 
the data structure, wherein the step of determining whether 
the time value for an LRU data entry was reset to zero 
comprises the steps of: 

(i) determining a difference between the sequence num- 
bers of the LRU data entries in the first and second data 
structures; and 

(ii) determining whether the difference between the 
sequence numbers is greater than half the maximum 
sequence number; and 

wherein the step of selecting the LRU data entry firom the 
data structures comprises the steps of: 

(i) selecting the LRU data entry having a largest sequence 
number after determining that the difference between 
the LRU data entries is greater than half the maximum 
sequence number; and 

(ii) selecting the LRU data entry having a lowest sequence 
number after determining that the difference between 
the LRU data entries is less than half the maximum 
sequence Dtunber. 

12. The method of claim 1, further comprising a third data 
structure indicating data entries in the first and second data 
structures, wherein each data entry indicated in the third data 
structure has a modified lime value indicating when the data 
entry was first modified in cache, further comprising the 
steps of: 

executing a routine at predetermined intervals to destage 

data from cache to a storage unit; 
providing a base time value indicating a previous time 

value; 

determining, when executing the routine, data entries in 

the third data structure having a time value that is older 

than the base time value; 
destaging, for each determined data entry in the third data 

structure, the copy of the modified data maintained in 

cache to the storage unit; and 
indicating the destaged data entry as removed from the 

third linked list. 

13. A system for caching data, comprising: 
a processing unit; 

a cache including data entries; 

a memory area including first and second data structures 
accessible to the processing unit, wherein the data 
structures indicate data entries in the cache, wherein 
each data structure has a most recently used (MRU) 
entry, a least recently used (LRU) entry, and a time 
value associated with each data entry indicating a time 
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the data entry was indicated as added to the MRU entry 
of the data structure; 
program logic executed by the processing unit, compris- 
ing: 

5 (i) means for processing the first and second data 
structures to determine a LRU data entry in each data 
structure; 

(ii) means for selecting from the determined LRU data 
entries the LRU data entry that is the least recently 

10 used; 

(iii) means for demoting the selected LRU data entry 
from the cache and data structure including the 
selected data entry; 

(iv) means for adding a new data entry to the cache; and 
15 (v) means for indicating the new data entry as located 

at the MRU entry of one of the first and second data 
structure. 

14. The system of claim 13, wherein the data structures 
are linked lists, wherein the MRU entry is at one end of the 

20 linked list and the LRU entry is at another end of the linked 
list. 

15. The system of claim 13, wherein the cache and 
memory area are included in a first memory device, further 
comprising a second memory device, wherein the cache 

25 stores data accessed from the secondary memory device. 

16. The system of claim 13, wherein the first data struc- 
ture indicates data entries m the cache sequentially accessed 
and the second data structure indicates data entries in the 
cache non-sequentially accessed, wherein the program logic 

30 further comprises: 

means for determining whether adding the new data entry 
to cache would cause the number of sequentially 
accessed data entries to exceed a threshold; and 
means for selecting the LRU data entry from the first data 
35 structure to demote after determining that adding the 
new data entry to cache would cause the sequentially 
accessed data entries to exceed the threshold. 

17. The system of claim 16, wherein the program logic 
further comprises: 

means for receiving a data access request for requested 
data in the cache; 

means for returning the requested data from the cache; 

means for determining whether the data access request is 
a sequential access; 

means for indicating that the requested data is located at 
the MRU entry of the first data structure after deter- 
mining that the data access request is a sequential 
access; and 

50 means for indicating that the requested data is located at 
the MRU entry of the second data structure after 
determining that the data access request is a non- 
sequential access. 

18. The system of claim 13, wherein the first data struc- 
55 ture indicates data entries written to the cache as part of a 

first type of write operation and the second data structure 
indicates data entries written to the cache as part of a second 
type of write operation, further comprising: 

a first storage unit, wherein data written to the cache as 
go part of the first type of write operation is also written to 
a first storage unit; 
a second storage unit; 

wherein the program logic further comprises: 

(i) means for determining whether the new data entry is 
65 part of the first write operation; 

(ii) means for selecting the LRU data entry from the 
first data structure to destage from the first storage 
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unit to the second storage unit after determining that 
the new data entry is part of the first write operation; 
and 

(iii) meaos for adding the new data entry to the first 
storage unit, wherein the step of indicating the new 
data entry as located at the MRU entry of one of the 
first and second data structure comprises indicating 
the new data entry as located at the MRU entry of the 
first data structure. 

19. The system of claim 18, wherein the program logic 
further comprises: 

means for processing modified data for a data entry that 

is already in cache; 
means for determining whether the modified data is one of 

the first type of write operation; 
means for updating the data entry in cache with the 

received modified data; 
means for updating the data entry in the first storage unit 

with the received modified data after determining that 

the received modified data is of the first type of write 

operation; 

means for indicating that the updated data entry is at the 
MRU entry of the first data structure after determining 
that the modified data is of the first type of write 
operation; and 

means for indicating that the updated data entry is at the 
MRU entry of the second data structure after determin- 
ing that the data access request is of the second type of 
write operation. 

20. The system of claim 13, wherein the memory area 
further comprises a third data structure indicating data 
entries in the first and second data structures, wherein each 
data entry in the third data structure has a modified time 
value indicating when the data entry was first modified in 
cache, wherein the program logic farther comprises: 

means for executing a routine at predetermined intervals 
to destage data from cache to a storage unit; 

means for providing a base time value indicating a 
previous time value; 

means for determining, when executing the routine, data 
entries in the third data structure having a time value 
that is older than the base time value; and 

means for deslagmg, for each determined data entry in the 
third data structure, the copy of the modified data 
maintained in cache to the storage unit; and 

means for indicating the destaged data entry as removed 
from the third data structure. 

21. An article of manufacture for use in programming a 
processing unit to manage cache, the article of manufacture 
comprising at least one computer readable storage device 
including at least one computer program embedded therein 
that causes the processing unit to perform the steps; 

providing a first and second data structures indicating data 
entries in a cache, wherein each data structure has a 
most recently used (MRU) entry, a least recently used 
(LRU) entry, and a time value associated with each data 
entry indicating a time the data entry was indicated as 
added to the MRU entry of the data structure; 

receiving a new data entry; 

processing the first and second data structures to deter- 
mine a LRU data entry in each data structure; 

selecting from the determined LRU data entries the LRU 
data entry that is the least recently used; 

demoting the selected LRU data entry from the cache and 
data structure including the selected data entry; 
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adding the new data entry to the cache; and 
indicating the new data entry as located at the MRU entry 
of one of the first and second data structures, 

22. The article of manufacture of claim 21, wherein the 
5 first data stmcture indicates data entries in cache sequen- 
tially accessed and the second data structure indicates data 
entries in cache non-scquentially accessed, wherein the step 
of selecting the LRU data entries comprises causing the 
processing unit to perform the steps of: 

determining whether adding the new data entry to cache 
would cause the number of sequentially accessed data 
entries to exceed a threshold, wherein the step of 
selecting from the determined LRU data entries the 
LRU data entry that is least recently used occurs after 
determining that adding the new data entry to cache 
would not cause the sequentially accessed data entries 
to exceed the threshold; and 

selecting the LRU data entry from the first data structure 
^ to demote after determining that adding the new data 
entry to cache would cause the sequentially accessed 
data entries to exceed the threshold; 

23. The article of manufacture of claim 22, further com- 
prising the steps of: 

25 receiving a data access request for requested data in the 
cache; 

returning the requested data from the cache; 
determining whether the data access request is a sequen- 
tial access; 

indicating that the requested data is located at the MRU 
entry of the first data structure after determining that 
the data access request is a sequential access; and 

indicating that the requested data is located at the MRU 
entry of the second data structure after determining that 
the data access request is a non-sequential access. 

24. The article of manufacture of claim 21, wherein the 
first data structure comprises data entries written to the 
cache as part of a first type of write operation and the second 
data structure comprises data entries written as part of a 
second type of write operation, wherein data written to the 
cache as part of the first type of write operation is also 
written to a first storage unit, and wherein the new data entry 
comprises data to write to a second storage unit, further 
comprising the steps of: 

determining whether the new data entry is of the first write 
operation type; 

selecting the LRU data entry from the first data structure 
to destage from the first storage unit to the second 
50 storage unit after determining that the new data entry is 
of the first type of write operation type; 

indicating the selected LRU data entry as removed from 
the first data structure; and 

adding the new data entry to the first storage unit, wherein 
the step of indicating the new data entry as located at 
the MRU entry of one of the first and second data 
structures comprises indicating the new data entry as 
located at the MRU entry of the first data structure. 

25. The article of manufacture of claim 24, further com- 
prising the steps of: 

receiving modified data for a data entry that is already in 
cache; 

determining whether the received modified data is of the 
65 first type of write operation; 

updating the data entry in cache with the received modi- 
fied data; 


03/12/2004, EAST Version: 1.4.1 


6,141,731 


21 


22 


Updating the data entry in the first storage unit with the 
received modified data after determining that the 
received modified data is of the first type of write 
operation; 

indicating that the updated data entry is at the MRU entry 
of the first data structure after determining that the 
modified data is of the first type of write operation; and 

indicating that the updated data entry is at the MRU entry 
of the second data structure after determining that the 
modified data is of the second type of write operation. 

26. The article of manufacture of claim 23, fiirther com- 
prising the steps of: 

providing a third data structure indicating data entries in 
cache sequentially accessed and a fourth data structure 
of data entries in cache non-sequentiaUy accessed, 
wherein the third and fourth data structures have an 
MRU entry and an LRU entry; 

determining whether the new data entry is sequentially 
accessed data; 

indicating that the new data entry is at the MRU entry of 
the third data structure after determining that the new 
data entry is sequentially accessed; and 

indicating that the new data entry is at the MRU entry of 
the fourth data structure after determining that the new 
data entry is non-sequcntially accessed. 

27. The article of manufacture of claim 21, further com- 
prising a third and fourth data structures, wherein the first 
data structure indicates modified data entries written in 
cache and a storage unit as part of a sequential write 
operation, wherein the second data structure indicates modi- 
fied data entries written in cache and the storage unit as part 
of a non-sequential write operation, wherein the third data 
structure indicates modified data entries written in cache and 
not the storage unit as part of a sequential write operation, 
and wherein the fourth data structure indicates modified data 
entries written in cache and not in the storage unit as part of 
a non-sequential write operation, wherein the step of pro- 
cessing the first and second data structures further comprises 
processing the third and fourth data structures to determine 
an LRU data entry from the first, second, third, and fourth 
data structures. 

28. The article of manufacture of claim 27, further com- 
prising the step of providing a modified data threshold 
indicating a maximum number of modified data entries to 
maintain in cache, wherein the step of processing the four 
data structures occurs after determining that the addition of 
modified data to the cache will cause the modified data 
threshold to be exceeded. 

29. The article of manufacture of claim 28, further com- 
prising a fifth data structure indicating the number of entries 
in the cache sequentially accessed and a sixth data structure 
indicating the number of entries in cache non-sequentially 
accessed, further comprising the steps of: 

providing a sequentially accessed threshold indicating a 
maximum number of sequentially accessed data entries 
to maintain in cache; 

determining whether the addition of a data entry to the 
cache will cause the number of data entries in cache to 
exceed the modified and sequentially accessed 
thresholds, wherein the step of processing the first, 
second, third, and fourth data structures occurs after 
determining that the addition of modified data to the 
cache will exceed the modified data threshold and not 
exceed the sequentially accessed threshold; 

processing the fifth and sixth data structures to determine 
LRU data entries after determining that the modified 
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threshold is not exceeded, wherein the steps of select- 
ing and demoting comprises selecting and demoting the 
LRU data entry that is the least recently used entry from 
the fifth and sixth data structures; and 
processing the first and third data structures to determine 
LRU data entries after determining that the modified 
and sequentially accessed thresholds are both 
exceeded, wherein the steps of selecting and demoting 
comprises selecting and demoting the LRU data entry 
that is the least recently used entry from the first and 
third data structures. 

30. The article of manufacture of claim 21, wherein the 
time value cannot exceed a maximum time value, wherein 
after reaching the maximum time value the time value resets 
to zero, further comprising the step of determining whether 
the time value for at least one of the LRU data entries was 
reset to zero, wherein the step of selecting the LRU data 
entry comprises selecting the LRU data entry that has a time 
value that was previously reset. 

31. The article of manufacture of claim 21, wherein the 
time value comprises a sequence number that is incremented 
when data entries are indicated as added to the MRU entry 
of the data structures, wherein the maximum time value 
comprises a maximum sequence number, and wherein the 
LRU data entry in a data structure has the lowest sequence 
number in the data stmcture, wherein the step of determining 
whether the time value for an LRU data entry was reset to 
zero comprises the steps of: 

(i) determining a difference between the sequence num- 
bers of the LRU data entries in the first and second data 
structures; and 

(ii) determining whether the difference between the 
sequence numbers is greater than half the maximum 
sequence number; and 

wherein the step of selecting the LRU data entry from the 
data structures comprises the steps of: 

(i) selecting the LRU data entry having a largest sequence 
number after determining that the difference between 
the LRU data entries is greater than half the maximum 
sequence number; and 

(ii) selecting the LRU data entry having a lowest sequence 
number after determining that the difference between 
the LRU data entries is less than half the maximum 
sequence number. 

32. The article of manufacture of claim 21, further com- 
prising a third data structure indicating data entries in the 
first and second data structures, wherein each data entry in 
the third data structure has a modified time value indicating 
when the data entry was first modified in cache, further 
comprising the steps of: 

executing a routine at predetermined intervals to destage 

data from cache to a storage unit; 
providing a base time value indicating a previous time 

value; 

determining, when executing the routine, data entries in 

the third data structure having a time value that is older 

than the base time value; 
destaging, for each determined data entry in the third data 

structure, the copy of the modified data maintained in 

cache to the storage unit; and 
indicating the destaged data entry as removed from the 

third data structure. 

33. A memory area accessible to a processing unit, 
wherein the memory unit stores: 

a plurality of data entries; 
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a first and second data structures indicating the data 
entries, wherein each data structure has a most recently 
used (MRU) entry, a least recently used (LRU) entry, 
and a time value associated with each data entry 
indicating a time the data entry was indicated as added 
to the MRU entry of the data structure; and . 

a new data entry added to the memory area, wherein the 
processing unit processes the first and second data 
structures to detcrmioe a LRU data entry in each data 
structure and selecting firom the determined LRU data 
entries the LRU data entry that is the least recently 
used, wherein the processing unit demotes the selected 
LRU data entry from the cache and data structure 
including the selected data entry, and wherein the 
processing unit indicates that the new data entry as 
located at the MRU entry of one of the first and second 
data structures. 

34. The memory area of claim 33, wherein the first and 
second data structures are comprised of linked lists in cache, 
wherein the MRU entry is at one end of the finked fist and 
the LRU entry is at the other end of the linked fist. 

35. The memory area of 33, wherein the first data struc- 
ture indicates data entries in cache sequentially accessed and 
the second data structure indicates data entries in cache 
non-sequentially accessed, wherein the processing unit 2i 
determines whether adding the new data entry to the 
memory area would cause the number of sequentially 
accessed data entries to exceed a threshold, and wherein the 
processing unit selects the LRU data entry firom the first data 
structure to demote after determining that adding the new 30 
data entry to cache would cause the sequentially accessed 
data entries to exceed the threshold. 

36. The memory area of claim 35, wherein the processing 
unit indicates that a data entry to which a data access request 
was directed is located at the MRU entry of the first data 
structure after determining that the data access request is a 
sequential access, and wherein the processing unit indicates 
that the data entry to which the data access request was 
directed is located at the MRU entry of the second data 
structure after determining that the data access request is a 
non-sequential access. 

37. The memory area of claim 33, wherein the first data 
structure indicates data entries written to the cache as part of 
a first type of write operation and the second data structure 
indicates data entries written to the cache as part of a second 
type of write operation, wherein the processing unit writes 
data written to the cache as part of the first type of write 
operation to a first storage unit, wherein the processing unit 
selects the LRU data entry from the first data structure to 
destage from the first storage unit to a second storage unit 
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after determining that the new data entry is part of the first 
write operation, and wherein the processing unit adds the 
new data entry to the first storage unit, wherein the step of 
indicating the new data entry as located at the MRU entry of 
one of the first and second data structures comprises indi- 
cating the new data entry as located at the MRU entry of the 
first data structure. 

38. The memory area of claim 33, further comprising a 
third and fourth data structures, wherein the first data 
structure indicates modified data entries written in cache and 
a storage unit as part of a sequential write operation, wherein 
the second data structure indicates modified data entries 
written in cache and the storage unit as part of a non- 
sequential write operation, wherein the third data structure 
indicates modified data entries written in cache and not the 
storage unit as part of a sequential write operation, and 
wherein the fourth data structure indicates modified data 
entries written in cache and not in the storage unit as part of 
a non-sequential write operation, wherein the step of pro- 
cessing the first and second data structure s further com- 
prises processing the third and fourth data structures to 
determine an LRU data entry from the first, second, third, 
and fourth data structures. 

39. The memory area method of claim 33, wherein the 
memory area further comprises a third data structure indi- 
cating data entries in cache sequentially accessed and a 
fourth data structiure indicating data entries in cache non- 
sequentially accessed, wherein the third and fourth data 
structures have an MRU entry and an LRU entry, wherein 
the processing unit indicates that the new data entry is at the 
MRU entry of the third data strucmre after determining that 
the new data entry is sequentially accessed; and indicates 
that the new data entry is at the MRU entry of the fourth data 
structure after determining that the new data entry is non- 
sequentially accessed. 

40. The memory area of claim 33, wherein the memory 
area further comprises: 

a third data structure indicating data entries in the first and 
second data structures, wherein each data entry in the 
third fist has a modified time value indicating when the 
data entry was first modified in cache; and 

a base time value indicating a previous time value, 
wherein the processing determines data entries in the 
third data structure having a time value that is older 
than the base time value and, for each determined data 
entry in the third data structure, destages the copy of the 
modified data maintained in cache to the storage unit. 
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A method and apparatus for serializing access to disk arrays 
shareable among a plurality of RAID control units at a 
substantial reduction in intercontrol unit communication by 
(a) defining a lock function over the parity image blocks at 
each of the disk drives of a shared disk array; and (b) 
executing a path expression al each accessing control unit, 
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on the parity image and enforcing a busy-wait until a lock is 
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DEVICE LEVEL COORDINATION OF 
ACCESS OPERATIONS AMONG MULTIPLE 
RAID CONTROL UNITS 

HELD OF THE INVENTION 

This iavention relates to storage subsystems formed from 
cyclic, multitrackcd devices, and more particularly to stor- 
age subsystems of the RAID 3 or RAID 5 type in which 
multiple RAID control units share access to arrays of such 
cyclic devices. 

DESCRIPTION OF RELATED ART 

The succeeding paragraphs briefly describe the data 
availability, storage capacity, and data rate tradeofifs among 
several RAID disk array configurations. This is followed by 
a discussion of aspects of the prior art management of 
accessing disk drives and arrays shared among two or more 
control units. 

RAID Arrays Fault Tolerance, and Recovery 

In the prior art, RAID arrays of cyclic, tracked storage 
devices have found use in increasing the availability and 
reliability of stored data. The increased availability and 
reliability is achieved by dedicating a portion of the capacity 
to redundant information. In the presence of corrupted data 
or unavailable disk drives, the RAID control unit uses the 
redundancy to either reconstruct data on the fly, rebuild data 
on spare disks, or both. Thus, in a RAID 1 configuration, 
each update to data is written out to two disks. If any single 
disk fails, then the duplicate disk is electronically switched 
into the access path as its replacement. Although remarkably 
fault tolerant, an N disk RAID 1 array has a storage capacity 
limited lo N/2 of its drives. 

A RAID 3 configuration is data rate intensive and sustains 
a data rate N times the rate of a single disk. Also, it creates 
a logical track N times the size of a physical track. In RAID 
3, N data blocks at a time are written or read across N 
counterpart synchronized disks and a parity image on an 
N+l"*' drive. Unfortunately, the high data rate also means 
that the concurrency rate is low. That is, only one application 
can access the drives. 

In contrast, a RAID 5 configuration is transaction or 
concurrency intensive. As illustrated in Clark et al., U.S. Pat. 
No. 4,761,785, "Parity Spreading to Enhance Storage 
Access", issued Aug. 2, 1988, N-1 data blocks and an 
associated parity image are written across N asynchronous 
disk drives in the same physical address range such that no 
single drive stores two or more blocks fi-om the same parity 
group, and such that no single drive stores all of the parity 
blocks. 

It should be recaUed that the RAID 3 configuration is 
affected by adverse loading. Each read and write reqxiires 
that all drives be accessed, including the parity drive. 
However, in the RAID 5 context, adverse loading can be 
minimized in several ways. First, since parity is spread out 
among the disks, no single disk bears aU the parity loading. 
Indeed, Mattson et al., U.S. Pat. No. 5,265,098, "Method 
and Means for Managing DASD Array Accesses When 
Operating in Degraded Mode", issued Nov. 23, 1993, pro- 
posed spreading data and parity blocks out among the disks 
in a pattern forming a balanced incomplete block design 
such that adverse loading would be minimized, even where 
a disk failed and the array was operating in a fault-tolerant 
mode. 

There have been many proposals both for operating RAID 
5 arrays and the like in degraded mode and returning the 
information state of any given array back to a fault-tolerant 
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mode. In this regard, Dunphy, Jr. et al, U.S. Pat. No. 
4,914,656, "Disk Drive Memory", issued Apr. 3, 1990, 
describe the use of a pool of hot spare disks available for 
rewriting in the event of single disk failure and in the 

5 presence of single image parity groups. 

Similarly, Ng et al., U.S. Pat. No. 5,278,838, "Recovery 
From Errors in a Redundant Array of Disk Drives", issued 
Jan. 11, 1994, disclose the online scheduling and rebuilding 
of data on a spare DASD or the like, the data having been 

10 stored on unavailable disk drives in a type RAID 1, 4, or 5 
array. The Ng invention relies upon coded error detection 
and assumes that failures would occur as random indepen- 
dent events. 

Management of Accessing Among RAID Control Units and 

15 Shared Disk Aaays 

In a RAID 5 disk array, updating one or more of the N 
blocks in a stripe stored on N disks requires four or more 
access operations and the recalculation of the parity block 
image. The data and parity blocks move between a control 

20 unit resident cache or buffer and two or more of the disk 
drives. The operations consist of (1) reading the old data 
block from disk; (2) reading the old parity block from disk; 
(3) writing the new data block to disk, recalculating new 
parity as the XOR of the old parity, old data, and the new 

25 data; and (4) writing the new parity block to disk. 

When two or more RAID control units seek access to data 
on drives in a shared array, there are myriad opportunities to 
corrupt data, such as where RAID read and write operations 
occur concurrently and where they involve more than one 

30 drive. Avoidance requires that RAID operations be coordi- 
nated in order to preserve data integrity. In the prior art, the 
RAID control imits have been required to communicate and 
negotiate a lock-like state serializing their access to the disk 
data. The constructs for serializing access between the RAID 

35 control units would frequently be some form of shared 
variable or message passing. Shared variable synchroniza- 
tion includes test and set, semaphores, conditional critical 
regions, monitors, or path expressions. 

While any one of these constructs, when utilized by both 

40 control units, serialize access to the same resource and 
preserve data integrity, they require that each RAID control 
unit devote considerable bandwidth to originating and send- 
ing messages as well as receiving and interpreting messages 
from the other control unit. Given that RAID control units 

45 may be configured in network relationships, the message 
traffic required lo synchronize access to shared disk arrays 
becomes nonlinear. Cumulatively, the network bandwidth, 
dedicated to synchronizing communications reduces the 
bandwidth available to either data rate or concurrency, or 

50 other storage and data management tasks. 

SUMMARY OF THE INVENTION 

It is accordingly an object of this invention to devise a 
method and apparatus for serializing access to disk arrays 
shareable among a plurality of RAID control units at a 
substantial reduction in intcrcontrol unit communication. 

It is yet another object that such method and means 
effectuate serialized access to data on said devices indepen- 
dent of any consistent state as among concurrently accessing 
RAID control units. 

It is a related object that such method and apparatus 
minimize reduction in either data rate or concurrency as a 
function of differences among random or sequential access 
65 patterns. 

"The above objects are believed satisfied by a method and 
apparatus for selectably locking a transfer path between one 
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of several RAID coDtrol units to a shared array of cyclic. Before describing the preferred embodiment, several pre- 

multitracked storage devices (disk drives). The control units liminary matters should be considered. First, in this pre- 

perform RAID functions on at least one data block and a ferred embodiment, the RAID arrays of choice are of the 

parity image of a parity set defined over the disk drives. The RAID 3 and RAID 5 types. Second, the general attributes of 
blocks arc distributed such that no single disk drive in the 5 RAID device types 1-5 are fully described in the Qark et al. 

array stores more than one block from an associated data set. patent and the references cited therein. Third, the term 

The method steps include defining a lock function on each "parity set" is any logical association of N data blocks and 

disk drive. This enables the disk drive to enqueue lock a parity image taken thereover as the (N+1)" block. Fourth, 

requests from accessing control units, grant lock requests to the terms "logical block'' and "block" are used synony- 

a requesting drive in enqueued order to an available parity mously and refer to a fixed length of addressable storage 

image stored on the device, and to release the lock respon- extent used on a disk drive. Fifth, the term "enqueue" 

sive to a control unit signal. The next step involves executing connotes the operation of placing a resource request or 

at the control unit a RAID function embedded within a path command in a queue or waiting list. Classically, enqueuing 

expression, a path expression being a construct for synchro- connotes a pair of synchronization primitives. The first 

nizing and ordering the activity relationship between the primitive "enqueue" is the operation of placing a resource 

control unit and one or more array drives. In this regard, the request in a queue under some discipline, such as FIFO. The 

RAID function requests an explicit lock on at least one request waits until the resource becomes available. At this 

predetermined parity image from a counterpart device. point, the second primitive "dequeue" is invoked as the 

Significantly, the execution of any RAID function is inhib- operation of placing the resource under the control of the 

ited until grant of the lock by the counterpart disk drive as requesting source and removing it from the queue or list, 
a form of busy-wait. When a lock is granted by the disk 20 In both RAID 3 and RAID 5 arrays, there are several ways 

drive, the control unit then proceeds to execute the RAID of creating or defining parity sets over the disks. One 

function on at least one data block and parity image of an convenient method is to define them by their storage prox- 

assodated parity set from the counterpart devices. The imity of blocks on disk drives as in the aforementioned Clark 

control unit signals the drive upon termination of the RAID '785 patent or in the Mattson '098 patent, 
function, thereby causing the device to relinquish the lock. 25 Clark, a parity image is formed from the data that is 

In this invention, the RAID control units are preferably of stored in the same range R;i of contiguous physical addresses 

a RAID 3 type and RAID 5 type or combination thereof. ^ each of N disks with the parity image also being stored in 

Both types sadsfy the consu-aint that no single drive stores the range on the N+1" disk. The N+1 blocks collectively 

more than one block from the same parity set. Additionally, are termed a "stnpe" or a panty set. In Clark, the location of 
a RAID 5 array is further limited in that no single drive 30 the parity image is rotated or spread in a round-robm manner 

stores all the parity images for the parity sets defined onto ^om stnpe to stnpe. From Clark, two logical relations can 
the array discerned. First, the data blocks in the same parity set or 

A path expression comprises a composite function shipe are resident in the same address range R^ 

execufed at a control unit specifying an ordering of unin- disks.Second the data blocks are covered by the same panty 

terruptible procedures interpreted by at least one of the ^^p-. A similar convention is expr^sed m Mattson. 

devices. It forms a coroutining relaUonship with at least one SeriahzaUon of Access to a Shared Resource 
of the storage devices. Tht ordering includes effectuating the past, ^^ncurrent access to shared disk data was 

lock access to the parity image on the counterpart device, managed either by high-level CPU lock-onented seriahs^- 

executing a RAID function over at least one data block and tion or significant intercontrol unit communications support- 

the parity image, and relinquishing the lock access. '^^^ ^ busy-wait condition, llie embodiment m FIG. 1 
Advantageously, each path expression executed at a first ^° illustrates a typical RAID 5 array as a substitute large logical 

control unit is independent of the consistency state of every «iisk positioned 10 a hierarchical storage subsystem with no 

other RAID control unit. intercontrol unit communications. 

In this invention, a RAID function is selected from a set 'Referring now to FIG, 1, there is shown a logical block.) 

consisting of reading, writing, or write modification of data ^ diagram of a RAID 5 disk array in a hierarchical storage s 

blocks and parity image of a parity set, data block or parity 45; subsystem according to the prior art. The RAID array 202/ / 

image regeneration, and parity set rebuilding. Typically, 211 is accessed by ^applications executing either on multi- ^/ ^ ^ ^ ' - r 

such RAID functions include multiple read and write com- tasking hosts GPU 1 or CPU 3, such as an IBM System/390 \ . ^ 

mands of which any lock request would be embedded in a running imder the IBM MVS operating system. The access ./^ / ^ 

first read or write command in the RAID function. is imposed over a path including the storage control imit 6 ^ \ ^ .y 

BRIEF DESCRIPTION OF THE DRAWING exemplified by an IBM 3990 SCU Mod 6. In this 

. u 1 • 1 ui 1 ^- f Ti ATi^ c -1- 1 configuration, the RAID array 202, 211 is externally oper- „ a r 

FIG. 1 shows a logical block diagram of a RAID 5 disk ^^^^ ^ ^ .^^ ^^^^ ^^^^ ^^^^^^ ^ / , 

'rior art ' ^^'^'"'"^'"'^ subsystem accordmg to the .^^^.^^^^.i ^isk drives 41 and 43 (such as the IBM 3390) 'J^,, ./ J L J " 

^"^A," . 1 J • * rtAirx • J J- 1 under control unit 35. Cy^ 

HGS. 2A-2B respectively depict a RAID-organized dxsk subsystem depicted in FIG. 1 is designed such that 

storage subsystem direc ly attached to a smgle hos and a > J J 

configuration where at least two disk arrays are attached ^z, aiui^u uu eujr ux kxi^ ^^uia^^ 7 ' 

K^^^rlL^ D ATn 43 can be accessed over any one of at least two railure- 

between two or more RAID control units. . ^ , ^ „ . / ^ , ^t^tt ■* ^ 

^ * 4 *u * t A ^ 4 a f *u A independent paths from either one of the CPUs 1 or 3, 

FIG. 3 sets out the control and data flow of the command u *u . u a c * 1 

. J , , - *u c * - i oi-oi J- 1 A • although the system as shown provides four failure - 

and data transfer paths of a typical SCSI disk dnve array . . *j , .il m . 1 j * j * u 
u J ■ tTir^c 'I A J -(n 60 mdependcnt paths. Illustratively, data on devices 2 IL can be 

attached as m FIGS, 2A and 2B , ^ ^ ^ . reached over any one of paths 21, 23, 25, or 27. IHe same 

no. 4 Illustrates the flow of control for RAID device holds for data stored on devices 41 or 43 via control unit 35. 

level panty block locking accordmg to the mvention. ^ description of this principle is to be found in Luiz et 

DESCRIPllON OF THE PREFERRED al., U.S. Pat. No. 4,207,609, "Method and Means for Path 
EMBODIMENT 65 Independent Device Reservation and Reconnection in a 

Preliminary Comments and Clark as a RAID Array Para- Multi-CPU and Shared Device Access System", issued Jun. 

digm 10, 1980. 
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In FIG. 1, when two applications on the same or different discussion of the embodiment of FIG. 1, each disk drive is 

CPUs seek concurrent access to the same data sets within the at least dual ported. Thus, either RAID control unit may 

same data volume, the condition in MVS is resolved through access every one of the drives in either disk array, 

the use of RESERVE/RELEASE or similar commands. The problem confronted in this invention is where CPU 

Under the MVS operating system, [RESERVE is a conmiand^ 5 301 and CPU 327 concurrently request access to drives 

tharturas a hardware lock against an entire disk volume so/ within the same parity set or stripe. Indeed, the shared access 

that no other CPU may access it For^reasons previously conflict is expected to increase over time as cHent/server and 

mentioned, such high-level inspired serialization involves network models proliferate, 

significant CPU processing overhead, denies path access to Disk Drive Execution Environment 

significant amounts of disk-stored data, and increases sus- lO The solution broadly requires (a) defining a lock function 

ceptibility to deadlock. The latter occurs when applications over the parity image blocks at each of the disk drives of a 

on CPU 1 and CPU 3 each reserve two separate disk shared disk array; and (b) executing a, path expression at 

volumes, and then each requests the volume that the other each accessing control unit, the path expression includes 

has reserved. requesting a lock from the drive on the parity image and 

Single and Shared RAID Disk Arrays 15 enforcing a busy -wait until a lock is granted, executing a 

Referring now to FIG. 2A, there is shown a RAID 5 disk RAID function, and releasing the lock. This requires that 

array organized storage subsystem directly attached to a each drive have sufficient processing and local memory 

sin^e host CPU 200. In this configuration, a RAID 5 capabiUty. In this regard, reference should be made to FIG. 

subsystem includes a control unit 202 and four attached disk 3. 

drives 211, otherwise denominated as in FIG. 1 as disks 307, 20 Referring now to FIG. 3, there is shown the control and 

309, 311, and 313 such as may be found in an IBM RAMAC data flow of the command and data transfer paths of a typical 

Array DASD attaching one or more Enterprise System SCSI disk drive, such as would be used in arrays 211 or 213. 

(S/390) ECKD channels through an IBM 3990 Mod 3 or 6 The drive is organized around two processing paths, namely 

storage control unit. The RAMAC array disk storage sub- a command processing path and data transfer path. The 

system comprises a rack with a capacity between 2 to 16 25 command processing path includes a bus interface 411, a 

drawers. Each drawer includes four disk drives sequencer 457, a microprocessor 445, and a servo processor 

HDD0-HDD3 (211) and a control unit 202. The RAID 441. The data transfer path includes the bus interface 411, a 

control vnii or control unit 202 includes cooling fans, control data buffer 415, chaimel electronics 463, a read/write head 

processor 207, ancillary processors 203, and a nonvolatile 425, and a cyclic, multitracked disk 427. 

drawer cache 205, 30 In FIG. 3, data is streamed out to or derived from 

Q EuHctionally, a device attachment unit 201 provides elec- addressed tracks on the magnetic or optical disk 427 over the 

trical and signal coupHng between the CPU 200 and one or/ data path, while storage (read/write) and access (seek/set 

more RAID 5 drawers. As tracks are staged and destaged sector) commands are processed by a command path also 

through this interface, they are converted from variable- within the disk drive 307. Commands and data from the host 

length CKD format to fixed-block length FB A format by the 35 1 are passed through the interface 411. As suggested, the 

ancillary processors 203. In this regard, drawer cache 205 is commands are interpreted and processed over the path 

the primary assembly and disassembly point for the blocking including a sequencer 457, the microprocessor control unit 

and reblocking of data, the computation of a parity block, 445, servo procevssor logic 441, and the physical accessing 

and the reconstruction of blocks from an unavailable array mechanism 423-425 to the cyclic, tracked di^ 427. In 

disk drive. A typical configuration would consist of several 40 contrast, data is passed to or from tracks on the disk 427 via 

drawers. An additional drawer (not shown) would include the interface 411, a data buffer 415, channel electronics 463, 

four disk drives operable as *'hot spares'*. This is an alter- a readAvrite head 425 adjacent to the recorded data on the 

native to siting a "hot spare" within each of the operational track, and amplifier electronics 422. 

drawers. The above-mentioned lock facility can be expressed at the 

In this embodiment, the four dLsk drives 307-313 are used 45 disk drive 307 as a series of functions written into control 

for storing parity groups. If a dynamic (hot) .sparing fe;ature store 451. Appropriate lock constructs, such as a lock table 

is used, then the spare must be defined or configured a* priori or hst-lock queue and command queue, are maintained at the 

in the spare drawer. Space among the four operational array microprocessor 445. The thesis of this invention is that the 

devices is di^ibuted such that there exists three DASD's addition of appropriate locking functions and queues of lock 

worth of data space and one DASD's worth of parity space. 50 requests in the disks drives can be used to coordinate the 

It should be pointed out that the disk drives 211, the cache functions of accessing RAID control units to both maintain 

205, and the processors 203 and 207 communicate over a data integrity and ehminate intercontrol unit communica- 

SGSI-managed bus 209. Thus, the accessing and movement tion. More particularly, drive-level locking combined with 

of data across the. bus between the disk drives 211 and the executing access operations as path expressions (composite 

cache 205 is closer to an asynchronous message-type inter- 55 functions) will avoid conflicting operations between the 

face. A typical layout of CKD tracks and parity images of control units, 

groups of CKD tracks over the disk drives follows the Lock Commands 

pattern described in the description of the prior art with Implementation of any lock facihty requires defining four 

reference to the Qark '785 patent. additional device commands, namely, Read/Lock, Write/ 

Referring now to FIG. 2B, there is shown a configuration 60 Lock, Lock, and Release/Lock, 

where at least two disk arrays 211 and 213 are attached ' The Read/Lock command is issued by the control unit to 

/between two or more RAID control units 202 A and 202B^ a disk drive in an array storing a parity image addressed by 

via SCSI buses 209 A and 209B. Alternatively, two or more the command. The Read/Lock command also recites a range 

disk drives could be multidropped between the pair of buses. , of logical block addresses (LBAs) which prospectively are 

For purposes of this discussion, the arrays are logically 65 involved in update, rebuild, or regeneration operations. A 

organized such that array 211 comprises disks 307-313 and command tag serves as an identifier for later release of the 

^array 213 coinprises disks 315-321. As pointed out in the lock. 
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The Read/Lock command can be processed in one of two 
ways. A first approach is for a control unit to send only the 
lock request and delay sending the read command until after 
receipt of the lock grant by the control unit from the disk 
drive. A second approach is for the control unit to send both 
parts of the command to the disk drive. If any part of the 
LB A range is already locked when the disk drive receives 
the Read/Lock command, the lock request as represented by 
this command will be enqueued at the disk drive until any 
previous lock on this LBA range is released. As soon as the 
lock becomes available, the drive can execute the read 
command. The first approach places the onus on the control 
unit, while the second passes the same to the disk drive. 

The Write/Lock command is issued by the control unit to 
a disk drive in an array with a range of LBAs for a fiill parity 
set or stripe write. The lock request and write delay is 
processed by the control unit and the disk drive in the same 
manner as is used in processing the Read/Lock command. 
However, after the lock has been granted, the command is 
executed as a normal write operation with respect to desig- 
nated data blocks and the parity image as reflected in the 
LBA address range designation. As mentioned above, the 
use of a single command allows earUer quelling of the write 
operation at the disk drive. 

The Lock command is issued by the control unit to a disk 
drive in an array storing a target parity image with a range 
of LBA addresses for either a partial stripe Write or a full or 
partial stripe Read. The lock portion is implemented in the 
same manner as that of the Read/Lock command. In this 
command set, there is no implied read or write operation. 
Any read or vniic command to the parity drive miist be 
issued or given effect only after the lock has been granted. 

The Release/Lock command is issued with an appropriate 
identifier (initiator and tag) when the sequence for which the 
lock was issued is complete. This also pertains if the 
sequence must be caaoeled. In this regard, the Release/Lock 
command operates to cancel the corresponding lock. The 
command is rejected if there is no lock in effect or queued 
with that identifier. 

Illustrative Path Expressions with Included RAID Functions 
As previously mentioned, a path expression is a synchro- 
nization construct to secure an interference-free interaction 
between a control unit and a drive storing a target parity 
image. Other terms, such as composite function or compos- 
ite operation, may be used synonymously. 

Three illustrative expressions set out below are Update 
Data, Rebuild Data, and Regenerate Data. The expressions 
are set out in a pseudo-code-like format and are executable 
by any of the control units such as 202 A or 202B in FIG, 2B 
with respect to a disk drive 307 in array 211. 
I— Update Data 

(a) Issue a Read/Lock command with a predetermined tag 
and LBA range to the parity disk drive. 

(b) When an indication has been received from the disk 
drive that the lock has been granted, issue the Read com- 
mand (this constitutes reading of old data for Update Write 
purposes). 

(c) When the old data block has been read, issue a Write 
command to the disk drive storing the old data block to 
update the data in place. 

(d) When the old parity has been received by the control 
unit from execution of the Read/Lock command at the parity 
disk drive, generate a new parity and issue a Write command 
to the parity drive to update in place the parity image block. 

(e) When the control unit receives indication that both 
Write commands have been executed, issue a Release/Lock 
command to the parity drive. 
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II— Rebuild Data 

(a) Issue a Read/Lock with a tag and LBA range to the 
parity drive. 

(b) When an indication has been received from the disk 
5 drive that the lock has been granted, issue a Read command 

to each of the data drives in the target parity set. 

(c) When indications have been received that all of the 
Read commands have been executed, issue a Write com- 
mand to the drive being rebuilt. 

(d) When indication has been received that the Write 
command has been completed, issue a Release/Lock com- 
mand to the parity drive. 

III — Regenerate Data 

(a) Issue a Read/Lock with a tag and LBA range to the 
parity drive. 

^5 (b) When ah indication has been received from the disk 
drive that the lock has been granted, issue a Read command 
to each of the data drives in the target parity set. 

(c) When indications have been received that all of the 
Read commands have been completed, issue a Release/Lock 

20 command to the parity drive, 

(d) Transfer regenerated data to the host CPU indpcndent 
of the disk drive operation. 

Lock Constructs and Processing at the Disk Drives 

It was pointed out that contemporary disk drives, such as 

25 depicted in FIG. 3, include significant local operation sched- 
uling and processing capacity. For this reason, each disk 
drive will include several lock management constmcts sited 
in its local processor memory or equivalent. These con- 
stmcts include a Command Queue, a Lock List, and a Lock 

30 Queue, 

A Command Queue is an ordered fist of executable 
commands received by a disk drive and not yet executed. 
Executable commands are those commands that inherently 
either do not require locks or that have locks in effect. 

35 Commands in the Command Queue may be reordered by the 
disk drive to optimize performance. 

A Lock List is a list of granted locks currently in effect. 
A lock list associates a command identification with a 
corresponding storage address extent. Significantly, address 

40 extents must not overlap. 

A Lock Queue is an ordered list of lock requests or 
pending locks. Each lock request or lock queue entry asso- 
ciates a command identification with a corresponding extent 
of disk drive addresses. When a preexisting lock is released, 

45 the lock discipline requires that the list be searched for the 
first command for which a new lock can be granted. 

Referring now to FIG. 4, there is shown the flow of 
control for RAID device level parity block locking accord- 
ing to the invention, Eadi new command is received by a 

50 disk drive in step 501 and assessed in step 503 as to whether 
it is a lock release, a lock request, or none of the above. If 
it is a lock release, control is transferred to step 517. If it is 
a lock request, control transfers to step 507. If none of the 
above, then the process jumps to step 515 where the com- 

55 mand is added to the command queue. 

If the command is a lock request, step 507 determines 
whether any part of the LBA address range as recited in the 
new command is already under lock. If the address range is 
currently subject to another lock, then the new command 

60 identification (ID) and the address extent arc added to the 
lock queue in step 509. On the other hand, if the extent is not 
under lock, the new command ID and extent are added to the 
lock list in step 511. At some point time subsequent, the disk 
drive grants the lock in step 513 and adds the command ID 

65 and extent to the command queue. 

If the command as tested in step 503 is a lock release, then 
it is examined in step 517 as to whether the command ID and 
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extent are already in the lock list. The presence of the 
command ID in the lock list will cause it to be deleted and 
the lock released in step 525. The lock queue will be tested 
for empty in step 527. If it is not empty, the lock queue is 
searched in step 529 and the first extent and associated 
command ID that are not locked are removed from the lock 
queue and added to the lock list in step 531. This further 
results in a lock being granted thereon in step 513 and the 
command added to the command queue in step 515. 

For completeness, it should be said that if a lock release 
is not in the lock list per step 517 and is not in the lock queue 
per step 519, then an error is reported in step 521. However, 
if it is in the lock queue per step 519, then it Ls deleted from 
the queue per step 523. 
Example of the Lock Processing 

Suppose a series of commands have been received by disk 
drive 307 from a RAID controller in a predetermined order. 
Each command is associated with a command ID and an 
extent or range of addresses over which the command will 
operate. The command order is stipulated as follows: 

TABLE 1 


Command ID 

LBA Extent 

Lock Status of LBA Extent 

Al 

1000-1500 

Yes-effected 

Bl 

5000-6000 

Not reqxiired 

A2 

3000-3100 

Yes-effected 

A3 

1300-1600 

Yes-queued until release by Al 

A4 

6500-7000 

Not required 

AS 

3000-3500 

Ycs-qucucd until release by A2 and B2 

B2 

3200-^00 

Ycs-cffccted 

B3 

2900-3100 

Yes-queued until release by A2 


Lock List based on this example and the above definitions. 

The Command Queue would be: 


TABLE 2 


CDminand tD 

LBA Extent 

Al 

1000-1500 

Bl 

5000-6000 

A2 

300Q-3100 

A4 

6500-7000 

B2 

3100-4000 

The Lock Queue would be: 


TABLE 3 


Command ED 

LBA ExUnt 

A3 

1300-1600 

A5 

300Q-3500 

B3 

1900-3100 

The Lock List would be: 


TABLE 4 


Command ID 

LBA Extent 

Al 

1000-1500 

A2 

3000-3100 

B2 

3200-4000 


10 


20 


25 


30 


The processor within the effected disk drive will form the 
lock constructs of a Command Queue, a Lock Queue, and a 


35 


45 


50 


55 


Referring again to FIG. 4 together with Tables 1-4, it 
should be apparent that in processing the first entry in Table 
1, say command Al, it would be identified as a lock request 


. 60 


65 


in step 503. Tracing it further, it would also be apparent that 
no part of its LBA extent was tmder lock per step 507. 
Consequently, command Al would be added to the lock list 
in step 511 (Table 4) and the lock granted in step 513. Lastly, 
it would be added to the command queue (Table 2) in step 
515. 

In contrast, command A3, while constituting a lock 
request in step 503, does have a portion of its LBA extent 
subject to a lock under Al in step 507. Accordingly, it is 
entered into the lock queue (Table 3) in step 509. 

Commands Bl and A4 do not require locks and are moved 
directly onto the command queue (Table 2) per steps 503 and 
515. 

The processing of a lock release is not treated directly in 
the above example. However, command A5 will remain on 
the lock queue (Table 3) until locks associated with com- 
mands A2 and B2 are released. The lock release of A2 and 
B2 respectively by a control unit would be processed in a 
traverse including steps 503, 517, 525, 527 and 529 since the 
lock queue (Table 3) would not be empty. 

While the invention has been described with respect to an 
illustrative embodiment thereof, it will be imderstood that 
various changes may be made in the method and means 
herein described without departing from the scope and 
teaching of the invention. Accordingly, the described 
embodiment is to be considered merely exemplary and the 
invention is not to be limited except as specified in the 
following claims. 

What is claimed is: 

1. A method for serializing access to individual storage 
devices in an array of storage devices,^ said array having 
parity groups of data blocks and an associated parity image 
block written across counterpart storage devices such that no 
drive stores more than one block of the same group, said 
array being addressable by two or more control units, 
comprising the steps of: 

(a) defining a lock function over the parity image blocks 
at storage devices of the storage device array; and 

(b) executing a path expression at an accessing control 
unit, the path expression includes requesting a lock 
from the storage device storing the parity image of the 
parity group addressed by the control unit, and enforc- 
ing a busy-wait until a lock is granted by the storage 
device, executing a RAID function, and then releasing 
the lock. 

2. The method according to claim 1, wherein the RAID 
function is one selected from a set consisting of modifying 
at least one data block and the parity image from at least one 
parity set on at least one device, and reconstructing at least 
one given data block or given parity image from remaining 
data blocks or parity image of any given parity set in the 
event of the unavailability of the given data block or parity 
image due to noise, corruption, or device failure and either 
staging it to a requesting control unit, writing it to at least 
one spare device or reserved area on the plurality of devices, 
or both. 

3. The method according to claim 1, wherein the path 
expression comprises a composite function specifying an 
ordering of uninterruptible procedures interpreted by at least 
one of the devices, the ordering includes effectuating lock 
access to the parity image on the counterpart device, execut- 
ing a RAID function over at least one data block and the 
parity image, and relinquishing the lock access, 

4. The method according to claim 1, wherein each path 
expression being executed at a first control unit is indepen- 
dent of a consistency state of a second control unit. 

5. The method according to claim 1, wherein the step of 
requesting a lock from the device includes the step of 
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generating a lock-oriented command selected from a com- 
mand set consisting of Read/Lock, Write/Lock, Lock, and 
Release/Lock, each lock-oriented command including a tag 
operable as an identifier for subsequent release of any 
granted lock. S 

6. The method according to claim 1, wherein requesting 
a lock from the storage device further comprises requesting 
a lock of a parity group corresponding to a range of Logical 
Block Addresses (LBAs) requested by the control unit, and 
wherein enforcing a busy-wait further comprises enforcing 
a busy-wait until a lock of the parity group corresponding to 
the range of LBAs requested by the control unit is granted. 

7. The method according to claim 1, wherein each device 
has defined thereon one data block from a first parity set and 

a parity image of a second parity set, and wherein the ^5 
method further comprises the steps at the device of enqueu- 
ing any lock request embedded in a first read or write 
command embedded in a RAID function, said device being 
responsive to at least one of a sequence of conmiands from 
a control unit executing a path expression during the pen- 20 
dency by the device of any current lock. 

8. The method according to claim 7, wherein the RAID 
control units are of a type selected from a set consisting of 
type RAID 3 and type RAID 5 control units, and further 
wherein the RAID function is selected from a set consisting 25 
of reading, writing, or write modification of data blocks and 
parity image of a parity set, data block or parity image 
regeneration, and parity set rebuilding. 

9. The method according to claim 7, wherein the step of 
executing a path expression includes the step of forming a 30 
coroutining relationship with at least one of the storage 
devices. 

10. The method according to claim 7, further comprising 
terminating the RAID function which includes signaling the 
device upon completion of all read and write commands 35 
within the RAID function. 

11. The improvement according to claim 10, wherein said 
first circuits include circuits for ascertaining whether a parity 
set covered by a lock request is concurrently subject to a 
lock in whole or in part and if subject to a lock said lock 40 
request is enqueued, and if not subject to a lock said lock 
request is granted and any command associated with said 
request in the path expression is placed on a command queue 
for execution by the device. 

12. A method for establishing a locked path between one 45 
of a plurality of RAID control units and at least one of a 
plurality of cyclic, multitracked storage devices, said control 
units concurrenQy accessing selected blocks of data from 
parity imaged data sets defined over the devices, the blocks 
being distributed such that no single device stores more than 50 
one block from an associated parity imaged data set, com- 
prising the steps at each of the RAID control tmits of: 

(a) defining a facility at each of the devices for enqueuing 
lock requests from said control units and for granting, 
maintaining, and releasing locks on the parity image 5s 
resident on the device, said locks being granted in 
enqueued lock request order; and 

(b) executing a path expression at respective control tmits 
inclusive of a RAID function including: ' 

(1) requesting a lock from the device on the parity 60 
image of an associated set, at least one of whose 
blocks is addressed in a counterpart RAID function; 

(2) inhibiting execution of any RAID function with 
respect to any parity image or data blocks addressed 
by said RAID function until a lock is granted by the 65 
device storing the parity image to the counterpart 
RAID control tmit; 
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(3) executing the RAID funcdon with reference to the 
parity image and data blocks addressed by the 
request responsive to the grant of the lock; and 

(4) terminating the RAID function and causing release 
of said lock by the granting device. 

13. A method for path locking one of a plurality of RAID 
control units to at least one of a plurality of shareable cyclic, 
multitracked storage devices, the control units performing 
RAID functions on at least one data block and a parity image 
of a parity set defined over the devices, the blocks being 
distributed such that no single device stores more than one 
block from an associated data set, comprising the steps of: 

(a) defining a lock function on each device and causing 
said device to enqueue lock requests from accessing 
control units, grant lock requests to a requesting device 
in enqueued order to an available parity image stored 
on the device, and release of the lock responsive to 
control unit provided indicia; and 

(b) executing by at least one control unit of a path 
expression inclusive of a RAID function including: 

(1) requesting an explicit lock on at least one prede- 
termined parity image from a counterpart device; 

(2) inhibiting the execution of the RAID function until 
grant of the lock by the counterpart device; 

(3) executing the RAID function on at least one data 
block and parity image of an associated parity set 
from the counterpart devices; and 

(4) terminating the RAID function and providing indi- 
cia to the lock granting device. 

14. In a storage subsystem having a plurality of cyclic, 
multiU"acked recording devices, each device storing blocks 
of data, a first and a second RAID control unit and coupling 
said recording devices, each control unit including an 
arrangement responsive to a write request for generating a 
parity block as a function of a set of data blocks and for 
writing the data blocks and the parity block for each set out 
to predetermined ones of the recording devices such that no 
single device stores more than one block from any one set 
and such that no single device stores all the parity blocks for 
the recorded sets, wherein the improvement at each device 
comprises: 

a lock facility including a lock manager, a status list of 
lockablc parity blocks and associated sets stored on the 
counterpart device, and a lock request queue; 

first circuits including the lock facility responsive to a 
lock request from one or more control units for iden- 
tifying at least the parity image of a parity set and either 
granting a lock to the first requesting device in 
enqueued order on a parity image if resident and 
available on the device or enqueuing the same; 

second circuits including the lock facility for executing a 
sequence of accessing, updating, or regeneration tasks 
as specified by the control unit after grant of a lock; and 

third circuits including the lock facility for releasing the 
lock responsive to termination indicia from the coun- 
terpart control ;init. 

15. An article of manufacture comprising a machine - 
readable memory having stored therein indicia of a pluraHty 
of processor-executable control program steps for path lock- 
ing one of a plurality of RAID control units to at least one 
of a plurality of shareable cyclic, multitracked storage 
devices, the control units performing RAID functions on at 
least one data block and a parity image of a parity set defined 
over the devices, the blocks being distributed such that no 
single device stores more than one block from an associated 
data set, said plurality of indicia of control program steps 
include: 
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(a) indicia of a control program step for defining a lock lock from the storage device storing the parity image of 
function on each device and causing said device to the parity group addressed by the control imit, and 
enqueue lock requests from accessing control units, enforcing a busy-wait until a lock is granted by the 
grant lock requests to a requesting device in enqueued storage device, executing a RAID function, and then 
order to an available parity image stored on the device, 5 releasing the lock. 

and release of the lock responsive to control unit 17. The storage subsystem of claim 16, wherein the 

provided indicia* and storage devices have defined thereon one data block from a 

r^. . ^ ' ^ ^, first parity set and a parity image of a second parity set, and 

(b) mdicia of a control program step for executing a path ^^^^^-^ ^ ^^^^^^^ ^^^-^ ^^^^^^ ^^^.^ ^^^^ 

expression inclusive of a RAID function by at least one comprises logic to enqueu any lock request embedded in a 

coDtrol unit mcludmg: first read or write command embedded in a RAID function, 

(1) requestmg an explicit^ lock on at least one prede- ^^-^^ responsive to at least one of a sequence of 
terrained parity miage from a counterpart device; commands from a control unit execuUng a path expression 

(2) inhibiung the execution of the RAID function unUl ^^^^^g pendency by the device of any current lock, 
grant of the lock by the counterpart device; gj^^^g^ subsystem of claim 16, wherein executing 

(3) executing the RAID function on at least one data 15 ^ ^^^^ expression includes forming a coroutining relation- 
block and parity image of an associated panty set ^^^^ j^^^ ^j^-^g^ ^^^i^^g 

from the counterpart devices; and ^^^^^^^ subsystem of claim 16, wherein the 

(4) terminating the RAID funcUon and providmg mdi- ^^^^^^^ ^^^.^^ ^^^^^^^ ^^g.^ determine whether a parity 
cia to the lock grantmg device. covered by a lock request is concurrently subject to a 

16. A storage subsystem comprismg: ^^^^ ^ ^^^^^ -^^ p^^ -f subject to a lock said lock 

a plurality of storage devices having a lock function to request is enqueued, and if not subject to a lock said lock 

lock parity image blocks in the storage device; and request is granted and any command associated with said 

a first and a second control unit, each coupled to the request in the path expression is placed on a command queue 

plurality of storage devices, wherein to access a storage for execution by the device. 

device, an accessing control unit executes a path 

expression, the path expression includes requesting a 
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(57) ABSTRACT 

Disclosed is a method, system, and program for prestaging 
data into cache from a storage system in preparation for data 
transfer operations. A first processing unit communicates 
data transfer operations to a second processing unit that 
controls access to the storage system. The first processing 
unit determines addressable locations in the storage system 
of data to prestage into cache and generates a data structure 
capable of indicating contiguous and non-contiguous 
addressable locations addressable locations in the storage 
system including the data to prestage into the cache. The first 
processing unit transmits a prestage command to the second 
processing unit. The prestage command causes the second 
processing unit to prestage into cache the data at the addres- 
sable locations indicated in the data structure. The first 
processing unit then requests data at the addressable loca- 
tions indicated in the data structure. In response, the second 
processing imit returns the requested data from the cache. 
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FIG. 3 
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PRESTAGING DATA INTO CACHE IN 
PREPARATION FOR DATA TRANSFER 
OPERATIONS 

BACKGROUND OF THE INVENTION 

1. Field of the Invention 

The present invention relates to a method, system, and 
program for prcstaging data into cache from a storage 
system in preparation for data transfer operations. 

2. Description of the Related Art 

Data prestaging techniques are used to prestage data from 
a non-volatile storage device, such as one or more hard disk 
drives, to a high speed memory, such as a volatile memory 
device referred to as a cache, in anticipation of future data 
requests. The data requests may then be serviced from the 
high speed cache instead of the storage device which takes 
longer to access. In this way, data may be returned to the 
requesting device faster. 

During a sequential read operation, an application 
program, such as a batch program, will process numerous 
data records stored at contiguotis locations in the storage 
device. It is desirable during such sequential read operations 
to prestage the sequential data into cache in anticipation of 
the requests from the application program. Present tech- 
niques used to prestage sequential blocks of data include 
sequential caching algorithms systems, such as those 
described in the commonly assigned patent entitled 
"CACHE DASD Sequential Staging and Method," having 
U.S. Pat. No. 5,426,761. A sequential caching algorithm 
detects when a device is requesting data as part of a 
sequential access operation. Upon making such a detection, 
the storage controller will begin prestaging sequential data 
records following the last requested data record into cache in 
anticipation of future sequential accesses. The cached 
records may then be returned to the application performing 
the sequential data operations at speeds substantially faster 
than retrieving the records from a non-volatile storage 
device. 

Another prestaging technique includes specifying a block 
of contiguous data records to prestage into cache in antici- 
pation of a sequential data request. For instance, tbe Small 
Computer System Interface (SCSI) provides a prefetch 
command, PRE-FETCH, that specifies a logical block 
address where the prestaging operation begins and a transfer 
length of contiguous logical blocks of data to transfer to 
cache. The SCSI PRE-FETCH command is described in the 
publication "Information Technology-Small Computer Sys- 
tem Interface-2," published by ANSI on Apr. 19, 1996, 
reference no. X3.131-199x, Revision lOL, which publica- 
tion is incorporated herein by reference in its entirety. 

Both these techniques for prestaging data records in 
anticipation of sequential operations are not useful for data 
records that have a logical sequential relationship but are 
stored at non-contiguous or dispersed physical locations in 
the storage device. Such prior art prestaging techniques are 
intended for sequential operations accessing data records 
stored at contiguous physical locations. For instance, the 
sequential detection algorithms and SCSI PRE-FETCH 
command do not prestage non-contiguous blocks. If the 
sequential detection algorithms and the SCSI PRE-FETCH 
command are used to prestage a range of data records 
including both the non-contiguously stored data records that 
are needed, then they will also prestage data records that the 
application program does not need. The above techniques 
waste processor cycles and cache storage space by prestag- 
ing data records that will not be requested. Thus, current 
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prestaging techniques do not provide an optimal solution for 
prestaging non-contiguous tracks into cache. 

Thus, there is a need in the art for improved prestaging 
techniques. 

5 

SUMMARY OF THE PREFERRED 
EMBODIMENTS 

To overcome the limitations in the prior art described 
above, preferred embodiments disclose a method, system, 
and program for prestaging data into cache from a storage 
system in preparation for data transfer operations. A first 
processing unit communicates data transfer operations to a 
second processing unit that controls access to the storage 
system. The first processing unit determines addressable 
locations in the storage system of data to prestage into cache 
and generates a data structure capable of indicating contigu- 
ous and non-contiguous addressable locations in the storage 
system including the data to prestage into the cache. The first 
processing unit transmits a prestage command and the data 
structure to the second processing unit. The prestage com- 
mand causes the second processing unit to prestage into 
cache the data at the addressable locations indicated in the 
data structure. The first processing unit then requests data at 
the addressable locations indicated in the data structure. In 
response, the second processing unit returns the requested 
data from the cache. 

In alternative embodiments, the storage system storage 
space is logically divided into multiple tracks, wherein each 
track includes one or more data records. Each data record 
includes an index area providing index information on the 
content of the data record and a user data area including user 
data. The addressable locations indicated in the data struc- 
ture comprise tracks in the storage system including the data 
records to prestage into the cache. In such embodiments, the 
data structure indicates addressable locations in the storage 
system including the data to prestage into the cache. 

In yet further embodiments, the data structure comprises 
a bit map data structure having bit map values for addres- 
sable locations in the storage system. Bit map values of one 
in the data structure indicate corresponding addressable 
locations including the data to prestage into cache. 

In still further embodiments, the addressable locations in 
the data structure correspond to data having a logical 

45 sequential ordering within the first processing unit. 

In fiu-lher instances, the addressable locations in the 
storage system including the data having the logical sequen- 
tial ordering are at non-contiguous addressable locations in 
the storage system. 

50 Preferred embodiments thus provide a mechanism to 
prestage data into cache using a data structure indicating 
addressable locations to prestage in a range of addressable 
locations. Preferred embodiments are particularly applicable 
to situations where an application program performs a 

55 sequential operation to process data records according to a 
logical sequential ordering. However, such data records 
having the logical sequential ordering may be stored at 
non-contiguous physical locations on a storage device. In 
such case, the data strucmre of the preferred embodiments 

60 can cause another processing unit, such as a storage con- 
troller that controls access to the storage system, to prestage 
into cache the data having a logical sequential relationship, 
yet stored at non-contiguous physical locations in the stor- 
age device. In this way, when the application program 

65 requests the data having the logical sequential ordering 
during sequential processing, the storage controller can 
return the requested data directly from cache. Returning the 
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data from cache is substantially faster than retrieving the communicates I/O operations to a storage controller, such as 

requested data from non-contiguous physical locations from the IBM 3990 Storage Controller, which controls access to 

the storage device. the DASD. Physical locations on the DASD are identified 

Preferred embodiments thus improve system performance ^j^^^' '^"^'^ f*°}^^^^^^ *' 

for apphcation programs performing sequential operations s CCHHR where CC indicates the cylmdcr, HH mdicates the 
on dala logicaUy ordered yet stored at non-contiguous ^ ^"^^^^^ '^H'^f °° ^ . ^ 

physical location by prestaging the logicaUy ordered data ^ set structure VSAMuUhzcs to stores data is the 

into a high-speed cache. Key Sequenced Data Sets (KSDS). KSDS is particularly 

applicable to data records that include a key value, which 

BRIEF DESCRIPTION OF THE DRAWINGS lo provides a unique identifier of the record. The storage space 

^ . . . . ^ for a KSDS file is divided into a plurality of control areas 

Referrmg now to the drawmgs m which like reference (CAs), which may comprise a single cyhnder of storage 

numbers represent corresponding parts throughout: gp^ce or fifteen tracks. Each CA is comprised of multiple 

FIG. 1 is a block diagram illustrating a software and control intervals (CIs). A CI can be comprised of one or 

hardware environment in which preferred embodiments of -,5 more records on the track. User data records are written to 

the present invention are implemented; particular CIs. For instance, a CA may be comprised of 15 

FIG. 2 illustrates a block diagram of data structures tracks, e.g., tracks 0-14, including one or more CIs for each 

utilized with preferred embodiments of the present inven- track, and three free tracks, e.g., tracks 12, 13, and 14. The 

tion; ^ser may specify a certain amount of free space in each CI 

nG.3illustrateslogictoprestagedatain accordance with 20 a CA. The free space is provided for insertion and 

preferred embodiments of the present invention; and lengthenmg of records into a d. The operating system 

rrT„ ^ -11 * * ■ r 1 1 J * would write user data records according to their logical key 

FIG. 4 illustrates a mappmg of key values, user data - * *u « * ox ■ r^A ■ r^i ,u r^i ■ r^ i 

1 J , . 1 . , ^ . \. . • . J .1. order to the first CI in a CA, i.e., CIn» until the CIs in the CA 
records, and physical storage locations that is utiuzed with j ^i7**u * j * j uu 

* ^ J ^,f^ were filled . With this system, data records would be written 

preferred embodiments of the present mvention. * *i. r^i j- * *u • 1 j - tu j * 

^ ^ 25 to the CIs according to their key ordenng, e.g., the data 

DETAILED DESCRIPTION OF THE record with the first key value as the first record in the first 

PREFERRED EMBODIMENTS CI in the first CA. Thus, the data records would be written 

to the CIs according to the ordering of their keys at con- 
In the following description, reference is made to the tiguous physical locations defined by the Qs 
accompanying drawings which form a part hereof and which ^^^^^ ^^^^ ^^j. ^j^ta records arranged 
illustrate several embodiments of the present invention. It is according to the logical key value to be dispersed at non- 
understood that other embodiments may be utiUzed and contiguous physical locations in the storage device is CI and 
structural and operational changes may be made without ^^A splitting. CI splitting occurs when an individual CI is 
departing from the scope of the present invention. g^^^ ^^^^ records. For instance, a CA may be set 

Problems With Current Prestaging Techniques 3^ equivalent to a cylinder or 15 tracks, and tracks 0-11 are for 

user data and tracks 12-14 are specified as tree. It user data 

A computer application program may maintain a logical records are written sequentially according to the key values 

mapping or index , that maps data records to key values to contiguous physical locations within the CIs in tracks 

describing a particular ordering of data records. The com- o_ii of the CA and a user data record needs to be inserted 

puter may also maintain a physical mapping that maps the or lengthened, then the operating system 6 will move half 

key values to physical locations on a storage device. There the user data records in the CI which is involved in the insert 

may also be additional mappings to map the data records to or lengthening operation to a CI in the free space of the CA, 

the exact physical locations on the storage device. e.g., in track 12, to make room in the current CI for the 

Oftentimes, an application program, such as batch inserted or lengthened record. With this control interval (CI) 

programs, may want to sequentially access numerous data 45 split, user data records, having a sequential logical key 

records according to key values that uniquely identify the ordering, that were previously in the same CI, at contiguous 

data records. However, the actual data records, which are physical locations, are now separated into non-contiguous 

logically sequentially ordered according to the key values, CIs. The user data records moved to the firee space CI in 

may be dispersed throughout the storage device, i.e., not track 12 are no longer at contiguous physical locations with 

stored at contiguous physical locations. In such case, if the 50 respect to the logically contiguous user data records that 

storage device is comprised of one or more hard disk drives remained in the CI where the split occurred. For instance, if 

or a tape storage device, then there will be latency delays a CI had user records with key values of A, B, C, and D, and 

while file storage device performs electro-mechanical set-up the C and D records were moved to track 12, then the C and 

operations, e.g., moving an actuator arm to the read location, D records would no longer be contiguous on the storage 

to access the non-contiguous physical storage locations on 55 device to A and B. Hence, CI splitting causes user data 

the disk drive storing the data records identified according to records that are sequentially ordered with respect to logical 

the logically sequential keys. Such latency delays in access- index values to be stored at non-contiguous physical 

ing non -contiguous physical locations could cause substan- locations, e.g., non-contiguous CCHHRs. 
tial delays in providing the application program with the Further, if all the tracks, including the free tracks, e.g., 

records during the sequential processing of the logically go tracks 0-14, in a CA are filled with user data records, then 

sequential records. the insertion or lengthening of a record in the CA, will cause 

The above described problem may arise when the Inter- a CA split, where certain records in the CAmay have to be 

national Business Machines Corporation (IBM) Virtual Stor- moved to a new CA to make room for the inserted or 

age Access Method (VSAM) is used to reference user data lengthened records. If the user data records were in a logical 

records stored in a storage device. The storage device may 65 sequential order with respect to a key index, then such 

be a direct access storage device (DASD), which is com- records moved to a different CA during the spht would no 

prised of numerous linked hard disk drives. A host computer longer have a sequential physical ordering as they are in a 
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different CA. Such splitting may occur whenever a record is 
inserted or an existing record is lengthened during sequential 
or direct processing of user data records. 

Problems arise when the operating system and storage 
system fails to maintain sequential ordering in both the 
logical and physical domains. An application program may 
process records in a sequential mode with respect to a logical 
sequential ordering of key values. However, such logically 
sequential records, for the reasons discussed above, may not 
be stored at contiguous physical locations, i.e., contiguous 
CIs and CCHHRs. In such case, delays may occur in 
retrieving these logically sequential records stored at non- 
contiguous physical locations. If the application program is 
performing fast sequential operations, then delays in access- 
ing the logically sequential data records at non-contiguous 
physical locations, which requires seek and rotation opera- 
tions on the disk surface, may substantially degrade the 
performance of the application program's sequential opera- 
tions. 

Hardware and Software Environment 

FIG. 1 illustrates a hardware and software environment in 
which preferred embodiments may be implemented. A host 
system 4 includes an operating system 6, an application 
program 8, and a KSDS index 12. The operating system 6 
could comprise the IBM ESA/390 operating system includ- 
ing a data management component such as the IBM Dis- 
tributed File Manager Storage Management System for 
MVS (DFSMS/MVS) software to provide the operating 
system 6 access to VSAM data sets and other data types. The 
application program 8 may comprise a batch program or a 
database program that performs sequential processing of 
data records according to a key ordering. 

A storage controller 20 receives input/output (I/O) opera- 
tions from the host system 4 and executes the received I/O 
operations against the direct access storage device (DASD) 
30. A cache 22 is comprised of one or more volatile memory 
devices. The storage controller 20 would stage data tracks 
into the cache 22 that are retrieved from the DASD 30 in 
anticipation of subsequent requests for the data. Further, the 
storage. controller 20 may prestage data tracks into cache 
before they are requested, in anticipation of such requests for 
the data. The DASD 30 may store data in a Count-Key-Data 
(CKD) format or fixed block format such as is used with 
SCSI, Details of a specific implementation of a storage 
controller 20, operating system 6, and data transfer opera- 
tions there between are described in the IBM publications: 
"Enterprise Systems Architecture/390: ESCON I/O 
Interface," IBM document no. SA22-7202-02 (Copyright 
IBM Corp., 1990, 1991, 1992) and "IBM 3990 Storage 
Control Reference (Models 1, 2, and 3)/' IBM document no. 
GA32-0099-06 (Copyright IBM Corp., 1998, 1994), which 
publications are incorporated herein by reference in their 
entirety. 

In the CKD format, a count field provides the name and 
format of a record, the key length, and the data length. The 
key field, which is optional and unrelated to the index key 
used to provide logical ordering to the application program 
8 records, is used for searching and may indicate the last data 
set in the record. The data field provides a variable length 
record of user data sets. The number of CKD records that 
can be placed on a track depends on the length of the data 
areas of the records. The physical location of a CKD record 
on a track is identified according to the R value of the 
CCHHR physical location identifier. The user data area of a 
CKD record may include multiple user data records, such as 
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data records (r) used by the application program 8 that 
correspond to key values (K). For instance, a track may 
include twelve 4K CKD records, i.e,, four R values in a 
CCHH, and the user data area of each CKD record may be 

5 comprised of multiple application records (r) having corre- 
sponding key values (K). 

The host system 4 may communicate with the storage 
controller 20 via channel paths by executing charmel pro- 
grams. In such case, data transfer operations are performed 

10 using channel command words (CCW) which are transferred 
from the host system 4 to the storage controller 20 along a 
channel path. The host system 4 may view the storage 
controller 18 as a multitude of separate control unit images 
or logical subsystems (LSSs), wherein each control unit 

15 image provides access to one or more I/O devices or LSS 
images of the DASD 30, Further details of how the host 
system 4 may communicate with the storage controller 20 
arc described in the commonly assigned and co-pending 
patent applications, all of which are incorporated herein by 

20 reference in their entirety: "Method And System For 
Dynamically Assigning Addresses To An Input/Output 
Device," by Brent C. Beardsley, Allan S. Merritt, Michael A. 
Paulsen, and Harry M. Yudenfriend, filed on Oct. 7, 1998, 
and having U.S. Pat. Ser. No. 09/167,782, U.S. Pat, No. 

25 6,185,638; "System For Accessing An Input/Output Device 
Using Multiple Addresses," by Brent C. Beardsley, James L. 
Iskiyan, James Mcllvain, Phillip R. Mills, Michael A. 
Paulsen, William G. Thompson, Harry M. Yiidenfriend, filed 
on Oct, 7, 1998, and having U.S. Pat. Ser, No, 09/168,017, 

30 U.S. Pat, No. 6,170,023; "Method, System, and Program for 
Performing Data Transfer Operations on User Data," by 
Brent C. Beardsley and Michael A. Paulsen, filed on the 
same date hereof and having U.S. Pat. Ser. No. 09/298,154, 
U.S, Pat. No. 6,105,076. 

35 The KSDS index 12 includes the key fields of the records 
and a relative byte address (RBA) that points to the CI that 
includes the user data record identified by the key field. The 
operating system 6 can calculate a CCHH location from the 
RBA as known in the art. The KSDS index 12 is organized 

40 in a tree structure. The entry node in the tree includes the 
high key value of the keys for the first sequential group of 
user records (r) that fit in the first CI and a RBA pointing to 
the physical CCHH location of the track including the first 
CI. When a CI is filled with user records (r), then the next 

45 group of user records, having key values sequential with 
respect to the records in the filled CI, will be placed in the 
next CI. The KSDS index 12 will then have a pointer from 
the first CI in the tree to the second CI that includes the next 
group of user records (r), sequential with respect to key 

50 values (K). This second index entry in the KSDS index 12 
includes the high key value of those records in this second 
CI and the RBA of the second CI that includes such next 
sequential set of records. 

FIG. 2 provides an example of the arrangement of the 

55 KSDS index 12. In this example, there are sequential keys 
K0-K7, which are key values uniquely identifying user data 
records (r). The first CIq in the KSDS index 12 can store 
records Ko and K^. Thus, the first entry in the KSDS index 
12 would include the high key value of the user data records 

60 in the first CIq and the RBAq of that CIq. In the example of 
FIG. 2, the next group of user data records are placed in the 
nth CI„, which is not physically contiguous to the first CI. 
The CI„ may have to be used for the next sequential group 
of user data records, identified by keys K^ and K4, if there 

65 is CI splitting. In the example of FIG. 2, Q„ can store the 
next three user data records corresponding to keys thru 
K4. The second KSDS index 12 entry to which the first index 
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entry points would thus include the high key value K4 and 
the RBA„ of Q^, which is not contiguous to the first index 
entry RBAq. In the example of FIG. 2, CI2, which has a 
physical location contiguous to the first CIj, can store the 
next three data records, having keys Kj thru K7. Thus, the 5 
third KSDS index 12 entry would include the high key value 
K7 of the user data records stored in the CI^ located at RBA^, 
which is at a physical contiguous location to the first Clj 
starting at the physical location specified by RBAj. In this 
way, the KSDS index 12 is logically contiguous with respect 10 
to key values, corresponding to user data records (r), but 
may not map those logically sequential data records, i.e., key 
values (K), to contiguous physical locations. Details on the 
VSAM implementation are described in IBM publication 
"VSEA^SAM User's Guide and Application Programming, 15 
Version 6 Release 1/' IBM document no. SC33-6632-00 
(Copyright IBM Corp. 1979, 1995) and the commonly 
assigned patent entitled "Method and Means for Cataloging 
Data Sets Using Dual Keyed Data Sets and Direct Pointers," 
U.S. Pat. No. 4,408,273, which publication and patent are 20 
incorporated herein by reference in their entirety 

Data Transfer Operations 

The CCW format provides a format for a sequence of 
commands used to transfer data between the host system 4 25 
and storage controller 20. The first command in the chain is 
a Define Extent command which defines the extent or range 
of tracks in which a channel program will operate. An extent 
is a set of consecutively addressed tracks that the channel 
program in the host 4 can access. The limits of an extent are 30 
defined by specifying the addresses of the first and last tracks 
in the extent. The Define Extent command further defines 
attributes of, and limitations on, the commands that follow 
in the channel program. Following is a Locate Record or 
Locate Record Extended command that specifies the 35 
operations, the number of consecutive records (or tracks) 
involved in the data operation, and the address of the first 
track and the orientation state to establish before starting 
data transfer. One or more read or write commands may 
follow the Locate Record command to perform data transfer 4q 
operations. The storage controller 20 will perform the 
requested operation with respect to the DASD 30 and 
present status information to the host 4 indicating whether 
the operation failed or successfully completed. 

Preferred embodiments include an additional command, 45 
referred to herein as a "Prestage Trackset" command, that is 
utilized within a CCW chain to provide notification to the 
storage controller 20 that a set of tracks will be accessed in 
a future operation. The Prestage Trackset command is 
included with a Locate Record Extended command and may 50 
be specified with a Prestage Trackset operation code within 
a Locate Record Extended parameter. If the Prestage Track- 
set operation code is specified, then the Locate Record 
Extended command would contain an Extended Parameter 
that provides a bit map of a range of tracks. Values in the bit ss 
map may be set to "0" or "1." A"l" value indicates that the 
track corresponding to the bit map value is to be prestaged 
into the cache 22, whereas a bit map value of "0" indicates 
that the corresponding track is skipped and not prestaged 
into cache 22. A Count parameter of the Locate Record to 
Extended command indicates the number of tracks to be 
transferred with the Prestage Trackset operation. The Count 
parameter is equal to the number of bit map values of one in 
the Extended Parameter bit map, i.e., those tracks in the 
range of sequential tracks to be prestaged. 65 

The first bit in the bit map must be and represents the 
track whose address is specified in the seek address param- 


eter. Subsequent addressed tracks are in ascending order. In 
preferred embodiments, tracks in the bit map represented by 
one bits are not limited to the tracks contained within the 
extent domain defined in the Define Extent. 

In preferred embodiments, a single CCW chain may 
include a Prestage Trackset command and data transfer 
commands. The Prestage Trackset ooiimiand would prestage 
data in anticipation of read requests for such data in subse- 
quent CCW chains. The read operations in the CCW chain 
including the Prestage Trackset should be for data tracks that 
were prestaged into cache in a previous CCW chain. In this 
way, in a single CCW chain, tracks that will be needed in 
future operations can be prestaged into cache and tracks 
previously prestaged can be read from cache. 

Data transfer operations in the CCW chain including the 
Prestage Trackset command would follow a Locate Record 
or Locate Record Extended command specifying such data 
transfer operations. This subsequent Locate Record or 
Locate Record Extended command would follow the Locate 
Record Extended Command including the Prestage Trackset 
operation. Thus, in preferred embodiments, the data transfer 
operations occur in a Locate Record domain following the 
execution and completion of the Prestage Trackset opera- 
tion. This insures that tracks are prestaged before any 
subsequent data transfer operations are performed. Further, 
in preferred embodiments, the Extended Parameter bit map 
used by the Prestage Trackset command may specify tracks 
to prestage that are not within the domain specified in the 
Define Extent command beginning the CCW chain includ- 
ing the Prestage Trackset operation. In a chain including 
both the Prestage Trackset operation and data transfer 
commands, the Define Extent domain may specify the range 
in which data operations are performed and Prestage Track- 
set operations may fall outside of this domain. In further 
embodiments, Prestage Trackset operations and read opera- 
tions can occur in any order in a CCW chain. 

FIG. 3 illustrates program logic to generate and process 
the Prestage Trackset command within both the host 4 and 
storage controller 20 systems. The operations shown in FIG. 
3 as implemented in the host 4 and storage controller 20 may 
be executed asynchronously. Control begins at block 50 
where the application program 8 initiates an operation to 
sequentially access numerous user data records according to 
a key index ordering assigned by the application program 8. 
The application program 8 maintains information on the 
logical sequential arrangement of user data records accord- 
ing to key (K) values, as well as information about free 
space; such information is unknown to the storage controller 
20. The operating system 6 would then process (at block 52) 
the KSDS index 12 to determine the RBAs pointing to CIs 
including the records (r) subject to the sequential access 
operations. The operating system 6 then calculates (at block 
54) the CCHHR ranges, i.e., CI ranges, from the determined 
RBAs including records subject to sequential access opera- 
tions. The CCHHR range of the CI can be determined from 
the RBAas the RB A indicates the starting CCHHR of the CI 
and the CI has a fixed length which is used to determine the 
ending CCHHR of the CI. As discussed, these CCHHR 
ranges, i.e., CIs, including the logically sequential user 
records (r) may be at non-contiguous physical locations. 

The operating system 6 would then generate (at block 58) 
a CCW chain including a Locate Record Extended com- 
mand indicating a range of tracks, a Prestage Trackset 
operation code, and an Extended Parameter bit map 
indicating, with a bit map value of one, all tracks within the 
range of tracks that include user data records subject to the 
application's 8 sequential operation. For tracks that contain 
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nothing but free space, conesponding bits in the Prestage 
Trackset bit map will always be zero. The operating system 
6 would then transfer (at block 60) the CCW chain to the 
storage controller 8. In preferred embodiments, the CCW 
chain including the Prestage Trackset operation may include 
an additional Locate Record domain with read commands. 
In response to receiving the Prestage Trackset command, the 
storage controller 20 would retrieve (at block 62) the data on 
the tracks having a corresponding bit map value of one in the 
Extended Parameter bit map. In preferred embodiments, the 
Prestage Trackset command causes the storage controller 20 
to transfer the entire contents of a track, including the coxint 
and key data, into the cache 22. After prestaging the data into 
cache 22, the storage controller 20 would return a channel 
end and device end status. Following the prestage operation, 
the host 4 may then, within the same CCW chain, transfer 
data transfer operations. 

In this way, with the Prestage Trackset command, the host 
4 can have data records that may be stored at non-contiguous 
physical locations staged into cache 22 in anticipation that a 
host 4 application program, such as the application program 
8, is sequentially accessing such data records according to a 
logical arrangement of data, e.g., according to key. 

The Prestage Trackset command of the preferred embodi- 
ments may be utilized with the Read Track Data command 
described in the co -pending and commonly assigned patent 
application entitled, "Method, System, and Program for 
Performing Data Transfer Operations on User Data," by 
Brent C. Beardsley and Michael A, PauLsen, filed on the 
same date hereof and having U.S. Pat. No. 6,105,067, 
incorporated by reference above. The Read Track Data 
command requests the storage controller 20 to transfer to the 
host 4 all the user data records on a track, following the first 
user data record, Rq, free of any of the count and key data 
on the track. 

FIG. 4 illustrates how user data records Tq-^^^j which are 
organized sequentially with respect to keys Kq-Ku, map to 
non-contiguous physical locations, e.g., CCHHR locations, 
in the DASD 30. As discussed, the data records (r) are not 
stored sequentially to match the logical sequential key 
ordering as a result of CI and CA splitting. For example, Tq, 
which is the first record in the key ordering, is in the CI 
starting at cylinder 1 (CI), track 1 (Tl), and the first CKD 
record (Rl). The next record r^ in the logical sequential 
ordering, having key value K2, is stored in the same CI as Tq. 
User record is stored in track 12, Thus, the logically 
sequential user records ro-r2 are stored at noncontiguous 
physical locations. 

Below is pscudo code for three CCW chains generated by 
the host 4 to prestage data before and during the application 
program 8 sequentially accessing the records corresponding 
to the first twelve key values, Kq thru Kji, as shown in FIG. 
4. A CCW chain may use the Prestage Trackset command in 
combination with one or more Read Track Data Commands. 
Two Locate Record Extended (LRE) commands may be 
included in a CCW chain, one for the Prestage Trackset 
command and the other for a string of Read Track Data 
commands. The Locate Record Extended command for the 
Prestage Trackset operation would prestage tracks having a 
bit map value of one in the Extended Parameter bit map. A 
second Locate Record Extended conmiand may follow the 
Locate Record Extended command for the prestage trackset 
operation. This second Locate Record Extended command 
may specify an Extended Parameter bit map of non- 
sequential tracks in a range of tracks subject to a sequence 
of Read Track Data commands to read user data records 
from the tracks having a bit map value of one. In preferred 
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embodiments, the host 4 would transfer the read commands 
and accompanying Locate Record command after receiving 
channel end and device end status indicating completion of 
the prestage operation. 

5 Below are three exemplar operations to prestage user data 
records and transfer user data records while the application 
program 8 is sequentially accessing and processing records 
Xq thru Tii, ordered sequentially with respect to key values 
Kq thru K^i, as shown in FIG. 4. 

10 Operation 1 

Define Extent (Cl-Tl to C1-T12); 

Locate Record Extended (Prestage Trackset, Bit map of 
twelve values representing tracks 1-12, with bit map 
values of 1 for tracks 1 and 12); 
15 Operation 2: 

Define Extent (Cl-Tl to C1-T12); 

Locate Record Extended (Prestage Trackset, Bit map of 13 
values representing tracks 1-13, with bit map values for 
tracks 5, 6, and 13 set to 1) Locate Record Extended 
20 (Read Track op code. Read Trackset with bit map of 
twelve values with bit map values for tracks 1 and 12 set 
to 1); 

Read Track Data from track 1; 

Read Track Data from track 12; 
25 Operation 3: 

Define Extent (Range C1-T5 to C1-T13) Locate Record 
Extended (Prestage Trackset, Bit map with twelve values 
with bit map values for tracks 1 and 12 in cylinder 2 set 
to 1); 

30 Locate Record Extended (Read Track op code. Read Track- 
set with bit map of nine values with bit map values for 
tracks 5, 6, and 13 in cylinder 1 set to 1); 
Read Track Data from track 5 in cylinder 1; 
Read Track Data from track 6 in cylinder 1; 
35 Read Track Data from track 13 in cylinder 1; 
Operation 4: 

Define Extent (Range C2-T1 to C2-T12) Locate Record 
Extended (Read Track op code, Read Trackset with bit 
map of twelve values with bit map values for tracks 1 and 

40 12 in cylinder 2 set to 1); 

Read Track Data from track 1 in cylinder 2; 
Read Track Data from track 12 in cylinder 2; 

In operation 1, the host 4 is performing set-up operations 
to prestage the tracks including the first four records to be 

45 sequentially accessed, ro to by the application program 8. 
As discussed there may be multiple user records (r) in a 
CKD record identified according to CCHHR. Tlie host 4 
sends a Prestage Trackset command to the storage controller 
20 to prestage tracks 1 and 12 from cylmder 1 into the cache 

50 22. The parameters in the Define Extent command refer to 
the start and end of the extent in which the following 
operations will be performed. 

After prestaging the first set of records, ro to rg, the host 
4 will issue a CCW operation 2 including a Prestage 

55 Trackset operation to prestage the tracks including the next 
four records, to r-j, to be sequentially accessed. The Define 
Extent command specifies a range of tracks, tracks 1-12, 
that include the tracks that will be read in the CCW chain of 
operation 2, and does not include the tracks to prestage that 

60 are outside of the domain of the tracks to read. For this 
reason, the Define Extent command in Operation 2 does not 
specify tracks to prestage, such as track 13, that are outside 
of the domain including the tracks to read. Following is a 
Locate Record Extended command including a bit map 

65 indicating the tracks Tl and 712 that include the first four 
records, Rq to R3, that the application program 8 will access. 
Following is a sequence of Read Track Data commands to 
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read the all user data records, free of any count or key 
information, from tracks 1 and 12 including user data 
records Rq to R3 which were prestaged into cache 22 in the 
CCW chain of operation 1. The sequence of Read Track 
Data commands apply to the tracks indicated in the Read 
Trackset bit map having a corresponding bit map value of 
one. Alternatively, the host 4 may use a sequence of Read 
Data commands to read a specific user data record from the 
tracks previously prestaged into cache 22, instead of using 
the Read Track Data command to read all the records from 
a track previously prestaged into cache 22. In both cases, 
because the requested data was prestaged into cache 22, the 
storage controller 20 may return the data fi-om cache 22. 

After operation 2, the application program 8 may begin 
performing operations on the first four sequential user data 
records Rq to Rg after the records are transferred from the 
cache 22 to the host 4. In operation 3, as the application 
program 8 is processing the first four records Rq to R3 as part 
of a sequential operation, the host 4 performs operation 3 by 
issuing a Locate Record Extended command indicating the 
Prestage Trackset operation to cause the storage controller 
20 to prestage from the DASD 30 into cache 22 those tracks 
Tl and T12 in cylinder 2 including the user data records Rq 
to Rij. TTie host 4 would specify in the Extended Parameter 
bit map that tracks 1 and 12 on cylinder 2 are to be prestaged 
into cache 22. The second Lx)cate Extended Record com- 
mand indicates a bit map indicating that tracks 5, 6, and 13 
are subject to a series of read operations to read the user data 
records from the CKD records in tracks 5, 6, and 13, which 
were previously prestaged into cache 22 in operation 2. The 
read operation may comprise a Read Track Data command 
or Read Data command to read specific user data records 
prestaged into cache 22. 

After performing operation 3, the application program 8 
may then process user data records R4 to R7. While pro- 
cessing these records, the host 4 may then perform operation 
4 to retrieve the final set of logically sequential records Rg 
to Rii that the application program 8 will process. Operation 
4 consists of a Locate Record Extended command indicating 
a read operation code, for a Read Track Data command or 
Read Data command, to be performed 00 the tracks corre- 
sponding to bit map values of one in the Extended Parameter 
bit map provided with the Locate Record Extended com- 
mand. 

With the above four operations, the host 4 may cause the 
storage controller 20 to prestage non-contiguous tracks into 
cache 22 in anticipation that the host 4 will later request user 
data records from these tracks for a sequential operation 
performed on user data records arranged in a logical 
sequence according to a key or other index type value. In this 
way, the host 4 can retrieve user data records the application 
program 8 needs without having to wait for the storage 
controller 20 to perform mechanical set-up operations to 
retrieve the data from DASD 30, e.g., track and seek 
movements to position a read head or tape set-up operations. 
Instead, the storage controller 20 may return the requested 
data directly from cache 22. 

Preferred embodiments thus allow an application per- 
forming a sequential processing operation on user data 
records according to a logical sequential ordering to imme- 
diately access the user data records that are in the logical 
sequential order, but stored at non-contiguous physical loca- 
tions. This provides an operation sequentially accessing 
records according to a logical key ordering that are stored at 
non-contiguous physical locations with the same perfor- 
mance that is achieved when performing a sequential opera- 
tion on data that is stored at contiguous physical locations. 
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Conclusion 

This concludes the description of the preferred embodi- 
ments of the invention. The following describes some alter- 
native embodiments for accomplishing the present inven- 
^ tion. 

The preferred embodiments may be implemented as a 
method, apparatus or article of manufacture using standard 
programming and/or engineering techniques to produce 
software, firmware, hardware, or any combination thereof. 
The term "article of manufacture" (or alternatively, "com- 
puter program product") as used herein is intended to 
encompass one or more computer programs and data files 
accessible from one or more computer-readable devices, 
carriers, or media, such as a magnetic storage media, "floppy 
disk," CD-ROM, a file server providing access to the 
programs via a network transmission line, holographic unit, 
etc. Of course, those skilled in the art will recognize many 
modifications may be made to this configuration without 
departing from the scope of the present invention. 

Preferred embodiments were described with respect to 
sequential data transfer operations which involve reading or 
writing munerous, user data records in a logical sequential 
relationship. However, in alternative embodiments, the pre- 
^ ferred embodiment commands, such as the Prestage Track- 
set command, may be xised to prestage tracks in preparation 
for non-sequential, direct processing or random data transfer 
operations. 

Preferred embodiments provided specific naming oonven- 

3Q tions for the data transfer operations described herein, such 
as Prestage Trackset, Read Data, Read Track Data, etc. 
However, any naming scheme or format may be used in 
implementing the commands which perform the functions 
described herein. 

35 Preferred embodiments were described with respect to a 
storage controller, host, and DASD system. In alternative 
embodiments, the preferred embodiment commands may be 
used with any type of storage system anangement, where 
one processing unit performs data operations witli respect to 

40 a storage device by communicating with another processing 
unit that manages and controls access to the storage device. 
The storage device storing the data may be any storage 
device known in the art, including volatile and non-volatile 
storage devices, such as tape drives, flash memory, optical 

45 disk drives, etc. For instance, the commands may be used 
with any processor-lo-processor communication, regardless 
of the environment in which the processors are 
implemented, i.e., the same computer, a network 
environment, etc. Further the cache 22 may be any type of 

50 volatile or non-volatfle storage area utilized by the storage 
controller 20 for data transfer operations. 

Preferred embodiments were described as implemented 
with certain operating system and application programs. 
However, these components were described as one preferred 

55 implementation. The preferred embodiment cormnands may 
be utilized in any operating system environment and with 
any application program which sequentially processes user 
data records maintained in a logical sequential ordering. 
Preferred embodiments are particularly applicable to situa- 

60 tions where the application program wants the performance 
of a sequential type operation on the logically arranged 
records. 

Preferred embodiments were described with respect to the 
CKD record format, where user data is stored in CKD 
65 records on a track, such that each CKD record includes a 
count field and may include a key field. The preferred 
embodiment commands may apply to other data storage 
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formats in which data is stored in records that include index What is claimed is: 

information, such as the coimt and/or key type information, 1. A method for prestaging data into cache from a storage 

along with the user data. Further preferred embodiments system in preparation for data transfer operations, wherein a 

may apply to the SCSI storage format which stores data in first processing unit communicates data transfer operations 

fixed blocks without the use of index information with each s to a second processing unit that controls access to the storage 

data record. The prestaging methods of the preferred system, comprising the first processing unit: 

embodiments tnay further be used with the partition data set determining addressable locations in the storage sys- 

extended (PDSE) storage format in which records are stored ^^^^^^ .^^ 

as fixed block records. r & .... 

In the SCSI or PDSE formats, the preferred embodiment W generating a data structure capable of indicaUng 

commands may be used to prestage data stored at aon- contiguous and non-conUguous addressable locations 

sequenUal fixed block addresses on the storage device to a storage system mcluding the data to prestage into 

cache in anticipation of sequential data operations per- cache by: 

formed on data records physically stored at non-sequential Q) generating a bit map data structure having bit map 

fixed block addresses on the storage device. Thus, those values for addressable locations in the storage sys- 

skilled in the art will appreciate that the preferred embodi- -^^ tem; and 

ment commands may apply to any data storage format where (ii) setting to one the bit map values concsponding to 

data records maintained in a logical sequential ordering may the addressable locations including the data to pre- 

nonctheless be stored at non-sequential physical locations oo stage into cache; 

the storage medium. (^.j transmitting a prestage command to the second pro- 
Preferred embodiments were described with respect to a cessing unit which controls access to the storage 
KSDS index used to map logically sequential records to system, wherein the prestage command causes the 
relative byte address (RBAs) that are converted to CCHHR ^^^^ processing unit to prestage into cache the data 
locations However, the index used to map the logicaUy addressable locations indicated in the data stmc 
sequential records to physical locations may comprise any 

such mapping and indexing technique known in the art. For 25 ' . , , ^ . ... - , , 

instance, if the data format is SCSI, then the index may (d) requesting data at the addressable locaUons mdicated 

include entries indicating fixed block addresses as the start- ^ ^^e data structure, wherem the second processing 

ing address of a range of logically sequential records, as returns the requested data firom the cache, 

opposed to the RBAvalue that indicates a CCHHR location. 2. The method of claim 1, wherein determining the 

Preferred embodiments were described with respect to a 30 addressable locations to prestage, comprises: 

bit map data structure that indicated tracks to prestage. In determining, with the first processing unit, data having a 

alternative embodiments, the unit to prestage may be dif- logical sequential ordering; and 

ferent than a track, such as a one or more records, fixed determining, with the first processing unit, the addressable 

blocks, etc. Still further, data structures other than a bit map locations in the storage system of the data having the 

may be used to indicate the data unit to prestage into cache. 35 logical sequential ordering, wherein the determined 

In summary, preferred embodiments disclose a method, addressable locations include the data to prestage into 

system, and program for prestaging data into cache from a the cache. 

storage system in preparation for data transfer operations. A 3. The method of claim 2, wherein the addressable loca- 
first processing unit communicates data transfer operations tions in the storage system including the data having the 
to a second processing unit that controls access to the storage logical sequential ordering are at non-contiguous address- 
system. The first processing unit determines addressable able locations in the storage system, 
locations in the storage system of data to prestage into cache 4. The method of claim 2, wherein the determination of 
and generates a data structure capable of indicating contigu- the addressable locations of the data having the logical 
ous and noa-contiguous addressable locations addressable sequential ordering is determined from a Volume Storage 
locations in the storage system including the data to prestage 45 Access Method (VSAM) Key Sequenced Data Set (KSDS) 
into the cache. The first processing unit transmits a prestage index. 

command to the second processing imit, The prestage com- 5. The method of claim 1, wherein the storage system 

mand causes the second processing unit to prestage into storage space is logically divided into multiple tracks, 

cache the data at the addressable locations indicated in the wherein each track, includes one or more data records, 

data structure. The first processing unit then requests data at 50 wherein each data record includes an index area providing 

the addressable locations indicated in the data structure. In index information on the content of the data record and a 

response, the second processing unit returns the requested user data area including user data, wherein the addressable 

data from the cache. locations indicated in the data structure comprise tracks in 

The foregomg description of the preferred embodiments the storage system including the data records to prestage into 

of the invention has been presented for the purposes of 55 the cache. 

illustration and description. It is not intended to be exhaus- 6* The method of claim 1, wherein the requested data that 

five or to hmit the invention to the precise form disclosed. the second processing unit returns from cache was prestaged 

Many modifications and variations are possible in light of into cache in a command sequence preceding the command 

the above teaching. It is intended that the scope of the sequence including the data request, 

invention be limited not by this detailed description, but 60 7. A method for prestaging data into cache from a storage 

rather by the claims appended hereto. The above system in preparation for data transfer operations, wherein a 

specification, examples and data provide a complete descrip- first processing unit communicates data transfer operations 

tion of the manufacture and use of the composition of the to a second processing unit that controls access to the storage 

invention. Since many embodiments of the invention can be system, comprising the second processing unit: 

made without departing from the spirit and scope of the 65 receiving a prestage command from the first processing 

invention, the invention resides in the claims hereinafter unit and a data structure capable of indicating contigu- 

appended. ous and non-contiguous addressable locations in the 
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Storage system including the data to prestage into the 
cache, wherein the data structure comprises a bit map 
data structure having bit map values for addressable 
locations in the storage system, wherein the bit map 
values corresponding to the addressable locations 
including the data to prestage into cache are set to one; 

prestaging into the cache the data at the addressable 
locations indicated in the data structure; 

receiving a data request from the first processing unit for 
data at the addressable locations indicated in the data 
structure; and 

retmning to the first processing unit the requested data 
from the cache. 

8. The method of claim 7, wherein the data structure 
comprises a bit map data structure having bit map values for 
addressable locations in the storage system, wherein the 
second processing unit prestages into cache data at the 
addressable locations having a corresponding bit map value 
of one. 

9. The method of claim 7, wherein the addressable loca- 
tions in the data stnicmre correspond to data having a logical 
sequential ordering within the first processing unit. 

10. The method of claim 9, wherein the addressable 
locations in the storage system including the data having the 
logical sequential ordering are at non-contiguous address- 
able locations in the storage system. . 

11. The method of claim 7, wherein the storage system 
storage space is logically divided into muUiple tracks, 
wherein each track includes one or more data records, 
wherein each data record includes an index area providing 
index information on the content of the data record and a 
user data area including user data, wherein the addressable 
locations indicated in the data structure comprise tracks in 
the storage system including the data records to prestage into 
the cache. 

12. The method of claim 7, herein the data request from 
the first processing unit is for data that was prestaged into 
cache in a command sequence preceding the command 
sequence including the data request. 

13. Amethod for prestaging data into cache from a storage 
system in preparation for data transfer operations, wherein a 
first processing unit communicates data transfer operations 
to a second processing unit that controls access to the storage 
system, wherein the storage system storage space is logi- 
cally divided into multiple tracks, wherein each track 
includes one or more data records, wherein each data record 
includes an index area providing index information on the 
content of the data record and a user data area including user 
data, comprising the first processing unit: 

determining addressable locations in the storage system of 
data to prestage into cache; 

generating a data structure capable of indicating contigu- 
ous and non-contiguous addressable locations in the 
storage system including the tracks to prestage into the 
cache by: 

(i) generating a bit map data structure having bit map 
values for addressable locations in the storage sys- 
tem; and 

(ii) setting to one the bit map values corresponding to 
the addressable locations including the tracks to 
prestage into cache; 

transmitting a prestage command to the second processing 
unit which controls access to the storage system, 
wherein the prestage command causes the second pro- 
cessing unit to prestage into cache the tracks at the 
addressable locations indicated in the data structure; 
and 
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requesting data at the addressable locations indicated in 
the data structure, wherein the second processing unit 
returns the requested data from the cache. 

14. Amethod for prestaging data into cache from a storage 
system in preparation for data transfer operations, wherein a 
first processing unit communicates data transfer operations 
to a second processing tmit that controls access to the storage 
system, wherein the storage system storage space is logi- 
cally divided into multiple tracks, wherein each track 
includes one or more data records, wherein each data record 
includes an index area providing index information on the 
content of the data record and a user data area including user 
data, comprising the second processing unit: 

receiving a prestage command from the first processing 
unit and a data structure capable of indicating a con- 
tiguous and non-contiguous range of addressable loca- 
tions in the storage system including the tracks to 
prestage into the cache, wherein the data structure 
comprises a bit map data structure having bit map 
values for addressable locations in the storage system, 
wherein the bit map values corresponding to the 
addressable locations including the tracks to prestage 
into cache are set to one; 

prestaging into the cache the tracks at the addressable 
locations indicated in the data structure; 

receiving a data request from the first processing unit for 
data at the addressable locations indicated in the data 
structure; and 

returning to the first processing unit the requested data 
from the cache. 

15. A system for transferring commands to a controller to 
prestage data into a cache from a storage system controlled 
by the controller in preparation for data transfer operations, 
comprising: 

a processing unit; 

program logic executed by the processing unit, compris- 
ing: 

(i) means for determining addressable locations in the 
storage system of data to prestage into cache; 

(ii) means for generating a data structure capable of 
indicating contiguous and non-contiguous address- 
able locations in the storage system including the 
data to prestage into the cache by: 

(a) generating a bit map data structure having bit 
map values for addressable locations in the storage 
system; and 

(b) setting to one the bit map values corresponding to 
the addressable locations including the data to 
prestage into cache; 

(iii) means for transmitting a prestage command to the 
controller, wherein the prestage command causes the 
controller to prestage into cache the data at the 
addressable locations indicated in the data structure; 
and 

(iv) means for requesting data at the addressable loca- 
tions indicated in the data structure from the 
controller, wherein the controller returns the 
requested data from the cache. 

16. The system of claim 15, wherein the program logic for 
determinmg the addressable locations to prestage, further 
comprises: 

means for determining data having a logical sequential 
ordering; and 

means for determining the addressable locations in the 
storage system of the data having the logical sequential 
ordering, wherein the determined addressable locations 
include the data to prestage into the cache. 
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17. The system of claim 16, wherein the addressable 
locations in the storage system including the data having the 
logical sequential ordering are at non-contiguous address- 
able locations in the storage system. 

18. The system of claim 16, wherein the determination of 
the addressable locations of the data having the logical 
sequential ordering is determined from a Volume Storage 
Access Method (VSAM) Key Sequenced Data Set (KSDS) 
index. 

19. The system of claim 15, wherein the storage system 
storage space is logically divided into multiple tracks, 
wherein each U-ack includes one or more data records, 
wherein each data record includes an index area providing 
index information on the content of the data record and a 
user data area including user data, wherein the addressable 
locations indicated in the data structure comprise tracks in 
the storage system including the data records to prestage into 
the cache. 

20. The system of claim 15, wherein the requested data is 
for data that was prestaged into cache in a command 
sequence preceding the command sequence including the 
data request. 

21. The system of claim 15, wherein data in the storage 
system is in a count-key-data format, and wherein the 
system comprises a host computer system and the controller 
comprises a storage controller, and the storage system 
comprises a direct access storage device (DASD). 

22. A controller for prestaging data in preparation for data 
transfer operations from a computer system, wherein the 
controller controls access to a storage system, comprising: 

a processing unit; 

a cache accessible to the processing unit; 
program logic executed by the processing unit, compris- 
ing: 

(i) receiving a prestage command from the computer 
system and a data structure capable of indicating 
contiguous and non-contiguous addressable loca- 
tions in the storage system including the data to 
prestage into the cache, wherein the data structure 
comprises a bit map data stmcture having bit map 
values for addressable locations in the storage 
system, wherein the bit map values corresponding to 
the addressable locations including the data to pre- 
stage into cache are set to one; 

(ii) prestaging into the cache the data at the addressable 
locations indicated in the data stmcmre; 

(iii) receiving a data request from the computer system 
for data at the addressable locations indicated in the 
data structure; and 

(iv) returning to the computer system the requested data 
from the cache. 

23. The controller of claim 22, wherein the data structure 
comprises a bit map data structure having bit map values for 
addressable locations in the storage system, wherein the 
controller prestages into cache data at the addressable loca- 
tions having a corresponding bit map value of one, 

24. The controller of claim 22, wherein the addressable 
locations in the data structure correspond to data having a 
logical sequential ordering within the computer system. 

25. The controller of claim 24, wherein the addressable 
locations in the storage system including the data having the 
logical sequential ordering are at non-contiguous address- 
able locations in the storage system. 

26. The controller of claim 22, wherein the storage system 
storage space is logically divided into multiple tracks, 
wherein each track includes one or more data records, 
wherein each data record includes an index area providing 


index information on the content of the data record and a 
user data area including user data, wherein the addressable 
locations indicated in the data structure comprise u-acks in 
the storage system including the data records to prestage into 
the cache. 

27. The controller of claim 22, wherein the data request 
from the computer system is for data that was prestaged into 
cache in a command sequence preceding the command 
sequence including the data request. 

28. The controller of claim 22, wherein data in the storage 
system is in a count-key-data format, and wherein the 
computer system comprises a host computer system and the 
controller comprises a storage controller, and the storage 
system comprises a direct access storage device (DASD). 

29. A system for transferring commands to a controller to 
prestage data into a cache from a storage system controlled 
by the controller in preparation for data transfer operations, 
wherein the storage system storage space is logically divided 
into multiple tracks, wherein each track includes one or 
more data records, wherein each data record includes an 
index area providing index infonnation on the content of the 
data record and a user data area including user data, com- 
prising: 

a processing unit; 

program logic executed by the processing unit, compris- 
ing: 

(i) means for determining addressable locations in the 
storage system of data to prestage into cache; 

(ii) means for generating a data structure capable of 
indicating contiguous and non-contiguous address- 
able locations in the storage system including the 
tracks to prestage into the cache by: 

(a) generating a bit map data stmcture having bit 
map values for addressable locations in the storage 
system; and 

(b) setting to one the bit map values corresponding to 
the addressable locations including the tracks to 
prestage into cache; 

(iii) means for transmitting a prestage command to the 
controller, wherein the prestage command causes the 
controller to prestage into cache the tracks at the 
addressable locations indicated in the data structure; 
and 

(iv) means for requesting data at the addressable loca- 
tions indicated in the data structure from the 
controller, wherein the controller returns the 
requested data from the cache. 

30. Acontroller for prestaging data in preparation for data 
transfer operations from a computer system, wherein the 
controller controls access to a storage system, wherein the 
storage system storage space is logically divided into mul- 
tiple tracks, wherein each track includes one or more data 
records, wherein each data record includes an index area 
providing index information on the content of the data 
record and a user data area including user data, comprising: 

a processing unit; 

a cache accessible to the processing unit; 
program logic executed by the processing unit, compris- 
ing: 

(i) receiving a prestage command from the computer 
system and a data structure capable of indicating 
contiguous and non-contiguous addressable loca- 
tions in the storage system including the data to 
prestage into the cache, wherein the data structure 
comprises a- bit map data structure having bit map 
values for addressable locations in the storage 
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system, wherein the bit map values corresponding to prise tracks in the storage system including the data records 

the addressable locations including the data to pre- to prestage into the cache. 

stage into cache are set to one; 36. The article of manufacture of claim 31, wherein the 

(ii) prestaging into the cache the tracks at the addres- requested data that the second processing unit returns from 
sable locations indicated in the data structure; S cache was prestaged into cache in a command sequence 

(iii) receiving a data request from the computer system preceding the command sequence including the data request, 
for data at the addressable locations indicated in the ^'^^ ^n article of manufacture including prestage corn- 
data structure* and mands to prestage data into cache from a storage -system in 

(iv) returning to the computer system unit the requested Preparation for data transfer operations wherein a first 
data from the cache lO Processing unit communicates data transfer operations to a 

31. An article of manufacture including prestage com- ~ processing unit that (X)ntrols access to the storage 
mands to prestage data into cache from a storage system in ^V^^f ^; ^^^^^ inanufacture compnsmg computer 
preparation for data transfer operations, wherein a first '^""^^^^^ mcludmg at least one computer 
processing unit communicates data transfer operations to a ^'""^'^J^ embedded therem that is capable of causmg the 
second processing unit that controls access to the storage 15 ^^^^^^ processmg unit to perform: 

system, the article of manufacture comprising computer ■ receivmg a prestage command from the first processing 

readable storage media including at least one computer ^ ^^^^ structure capable of indicating contigu- 

program embedded therein that is capable of causing the first non-contiguous addressable locations in the 

processing unit to perform: storage system including the data to prestage into the 

/ V J ^ . . J J . t , • . on cache, wherein the data structure comprises a bit map 

(a) determimng addressable locations m the storage sys- j * * * i. • i r 

^ \ r-,.. ^ • ^ . data structure havmg bit map values for addressable 

tem of data to prestage. into cache; i * • • . * l • *i. l 

^ ' locations m the storage system, wherem the bit map 

(b) generating a data structure capable of indicating values corresponding to the addressable locations 
contiguous and non-contiguous addressable locations including the data to prestage into cache arc set to one; 
in the storage system including the data to prestage mto prestaging into the cache the data at the addressable 
the cache oy. , . . . locations indicated in the data structure; 

(i) generatmg a bit map data stmcture having bit map . . . , . - «u « * • f 

f jj U11 *• • *u * receiving a data request from the first processing unit for 

values for addressable locations m the storage sys- j * , *u ui i *• • 5- * j • *u j * 

J & da^^a at the addressable locations mdicated in the data 

tem; and structure- and 

(ii) setting to one the bit map values corresponding to . ■ / n . • . j j . 
the ad^essable locations including the data to pre- retummg to the first processing umt the requested data 

stage into cache* 

, . ^ . - ' J . J 38. The article of manufacture of claim 37, wherein the 

(c) transmitting a prestage command to ttie sec»na pro- ^^^^ 

Structure comprises a bit map data structure having bit 

cessmg unit which controls access to the storage ^^j^^^ addressable locations in the storage system, 

systein, wherein the prestage cotnmand causes the 35 wherein the second processing unit prestages into cache data 

second processing unit to prestage into cache the data ^, addressable locations having a corresponding bit map 

at the addressable locations mdicated in the data struc- „„|,_ ^ „ 

value 01 one. 

^ 39. The article of manufacture of claim 37, wherein the 

(d) requesting data at the addressable locations indicated addressable locations in the data structure correspond to data 
in the data structure, wherein the second processing 40 having a logical sequential ordering within the first process- 
unit returns the requested data from the cache. jj^g ^^^j 

32. The article of manufacture of claim 31, wherein 49 xhe article of manufacture of claim 39, wherein the 
determining the addressable locations to prestage, com- addressable locations in the storage system including the 
pri^^' data having the logical sequential ordering are at non- 
determining, with the first processing unit, data having a 45 contiguous addressable locations in the storage system. 

logical sequential ordering; and 41, The article of manufacture of claim 37, wherein the 

determining, with the first processing unit, the addressable storage system storage space is logically divided into mul- 

locations in the storage system of the data having the tiple tracks, wherein each track includes one or more data 

logical sequential ordering, wherein the determined records, wherein each data record includes an index area 

addressable locations include the data to prestage into 50 providing index information on the content of the data 

the cache. record and a user data area including user data, wherein the 

33. The article of manufacture of claim 32, wherein the addressable locations indicated in the data structure corn- 
determination of the addressable locations of the data having prise tracks in the storage system including the data records 
the logical sequential ordering is determined from a Volume to prestage into the cache. 

Storage Access Method (VSAM) Key Sequenced Data Set 55 42. The article of manufacture of claim 37, wherein the 

(KSDS) index. data request from the first processing unit is for data that was 

34. The article of manufacture of claim 32, wherein the prestaged into cache in a command sequence preceding the 
addressable locations in the storage system including the command sequence including the data request. 

data having the logical sequential ordering are at non- 43. A computer readable memory device accessible to a 

contiguous addressable locations in the storage system, 60 processing unit, wherein the memory device includes a 

35. The article of manufacture of claim 31, wherein the prestage command and a data structure capable of indicating 
storage system storage space is logically divided into mul- contiguous and non-contiguous addressable locations in a 
tiple tracks, wherein each track includes one or more data storage system including data to prestage into a cache from 
records, wherein each data record includes an index area the storage system in preparation for data transfer operations 
providing index information on the content of the data 65 between a first processing unit and second processing unit, 
record and a user data area including user data, wherein die wherein the data structure comprises a bit map data structure 
addressable locations indicated in the data structure com- having bit map values for addressable locations in the 
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Storage system, wherein the bit map values corresponding to 
the addressable locations including the data to prestage into 
cache are set to one, wherein the prestage command and the 
data structure arc communicated from the first processing 
unit to the second processing unit, and wherein the prestage 
command is capable of causing the second processing unit 
to prestage into the cache the data at the addressable 
locations indicated in the data structure, 

44. The memory device of claim 43, wherein the data 
structure comprises a bit map data structure having bit map 
values for addressable locations in the storage system, 
wherein the second processing unitprestages into cache data 
at the addressable locations having a conesponding bit map 
value of one. 

45. The memory device of claim 43, wherein the addres- 
sable locations in the data structure correspond to data 
having a logical sequential ordering within the first process- 
ing unit. 

46. The memory device of claim 45, wherein the addres- 
sable locations in the storage system including the data 
having the logical sequential ordering are at non-contiguous 
addressable locations in the storage system. 

47. An article of manufacture including prestage com- 
mands to prestage data into cache from a storage system in 
preparation for data transfer operations, wherein the storage 
system storage space is logically divided into multiple 
tracks, wherein each track includes one or more data 
records, wherein each data record includes an index area 
providing index information on the content of the data 
record and a user data area including user data, wherein a 
first processing unit communicates data transfer operations 
to a second processing unit that controls access to the storage 
system, the article of manufacture comprising computer 
readable storage media including at least one computer 
program embedded therein that is capable of causing the first 
processing unit to perform; 

determining addressable locations in the storage system of 
data to prestage into cache; 

generating a data structure capable of indicating contigu- 
ous and non-contiguous addressable locations in the 
storage system including the data to prestage into the 
cache by: 

(i) generating a bit map data structure having bit map 
values for addressable locations in the storage sys- 
tem; and 
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(ii) setting to one the bit map values conesponding to 
the addressable locations including the data to pre- 
stage into cache; 
transmitting a prestage command to the second processing 
unit which controls access to the storage system, 
wherein the prestage command causes the second pro- 
cessing unit to prestage into cache the tacks at the 
addressable locations indicated in the data structure; 
and 

requesting data at the addressable locations indicated in 
the data structure, wherein the second processing unit 
returns the requested data from the cache. 
48. An article of manufacture including prestage com- 
mands to prestage data into cache from a storage system in 
preparation for data transfer operations, wherein the storage 
system storage space is logically divided into multiple 
tracks, wherein each track includes one or more data 
records, wherein each data record includes an index area 
providing index information on the content of the data 
record and a user data area including user data, wherein a 
first processing unit communicates data transfer operations 
to a second processing unit that controls access to the storage 
system, the article of manufacture comprising computer 
readable storage media including at least one computer 
program embedded therein that is capable of causing the 
second processing unit to perform: 

receiving a prestage command from the first processing 
unit and a data structure capable of indicating contigu- 
ous and non-contiguous addressable locations in the 
storage system including the data to prestage into the 
cache, wherein the data structure comprises a bit map 
data structure having bit map values for addressable 
locations in the storage system, wherein the bit map 
values corresponding to the addressable locations 
including the data to prestage into cache are set to one; 
prestaging into the cache the tracks at the addressable 

locations indicated in the data structure; 
receiving a data request from the first processing unit for 
data at the addressable locations indicated in the data 
structure; and 

returning to the first processing unit the requested data 
from the cache. 
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